Sunday, June 28, 2009

Boto 1.8c Released

A new version of boto is available. Version 1.8c fixes a serious issue in the S3 module and two Unicode-related issues in the mturk module.

I recommend all boto users who were running 1.8a or 1.8b upgrade to 1.8c asap.

You can download the new release at http://boto.googlecode.com/

Mitch

Wednesday, June 24, 2009

Boto 1.8a Released

Version 1.8a of boto, the Python library for Amazon Web Services, has been released. This version includes:
  • Released support for CloudWatch, Elastic Load Balancer (ELB) and AutoScale services. Thanks to Reza Lotun for contributing the AutoScale service module!
  • Support for POST on SimpleDB BatchPutAttributes requests and EC2 RunInstances requests. Both of these requests allow more data than can fit in a GET request. Supporting POST for these requests eliminates the restrictions of GET. Thanks to Andrey Smirnov and Brett Taylor from AWS for providing details on the use of POST in Query API's.
  • A number of changes to further support the use of boto with Eucalyptus. Thanks to Neil Soman for his help and patience in remote debugging these changes.
  • Fixes for Issues 226, 232, 233, 234, 237, 239
  • Many other small fixes and improvements
The new version can be downloaded from the Google Code Project Site.

Thursday, June 18, 2009

Managing Your AWS Credentials (Part 2)

In Part 1 I discussed the basics of AWS credentials, such as what they are, where to find them and how to protect them. In this installment, I'm going to talk about some of the real world challenges you face when deploying production systems in AWS. Finally, I'll talk about some strategies for dealing with those challenges.


The diagram above shows a configuration that is similar to many production systems we have deployed. It consists of a load-balancing front end, and a scalable set of application servers that handle user requests. The load-balancing can be down in one of two ways:
  1. A scalable number of EC2 instances, spread across availability zones, running reverse-proxy software like Apache mod_proxy, HAProxy or NGinX. Each of the instances would have an Elastic IP address attached and you would create a DNS A-record for each Elastic IP address. Then, you could use DNS round-robin load balancing to spread requests across your front end servers.
  2. Elastic Load Balancer, one of the new services from AWS, which takes the place of the EC2 instances and DNS magic described above and just provides a DNS CName to which all traffic destined for your application should be sent. ELB then manages the scaling of the front end and the load balancing across your application servers.
Regardless of how the load-balancing is accomplished, one thing is clear from the diagram. The application servers are making requests to Amazon services such as SimpleDB, S3 and SQS and in order to do that they need to have access to AWS credentials, specifically your AccessKeyId and your SecretKey. As you recall from Part 1, those credentials are essential for calculating the authenticating Signature required for all API calls.

That raises two interesting questions.

How do the AWS credentials get installed on the applications servers?
One way you could make the credentials available to the application servers is to bundle the credentials into the AMI used for the application servers. That would work but it's a pretty bad idea. First of all, it means that if you ever have to change the credentials you also have to bundle a new AMI. Yuck. Secondly, it creates too many opportunities for errors. For example, someone might rebundle the AMI for a different purpose and forget to remove the credentials. Or, you might inadvertently share the AMI with another AWS user or even accidentally make it public and not even realize your mistake for a while. In the meantime, anyone who would have launched the instance could have found the credentials. Yuck, again.

A better approach would be to pass the credentials to the AMI in the user_data that you can supply when you launch an instance. In this way, the credentials are passed as a parameter to the AMI rather than having them baked in which makes things a lot more flexible and at least a little bit more secure. There are some other options available, but I'll save those for Part 3 in the series.

What risks do I incur by having them there?
This is where we get to the "waking up in a cold sweat in the middle of the night" part of the article. If you put valuable information on a server, you have to acknowledge that there is at least some risk that a baddie will find that valuable information, despite your best efforts to thwart him/her. So, having accepted that possibility, what's the worst that could happen?

Well, if you have one set of AWS credentials for all production services (e.g. EC2, S3, SimpleDB, SQS, ELB, etc.) and if those credentials are discovered by a baddie, then the worst that could happen is very, very bad indeed. With your production credentials in their hot little hands, the baddie can:
  • Terminate all EC2 instances
  • Access all customer data stored in S3 or SimpleDB
  • Delete all data stored in S3 or SimpleDB
  • Start up a bunch of new EC2 instances to run up charges on your AWS account
  • Use your account to attack other sites
  • Lot's of other things that I'm not sneaky enough to even imagine
I'll pause here for a moment to give you time to pour yourself a wee dram to calm your nerves and regain your composure. The thing you have to remember about your AWS credentials is that they are essentially the keys to your virtual data center and if they fall into the wrong hands it's kind of like letting a bunch of monkeys with crowbars loose in your colo cage. And, while this discussion is focused on AWS, the same can be said about other cloud services. Over time, the tools available to manage your credentials will undoubtedly improve but in the short term we need a strategy to mitigate the risk as much as possible.

Multiple Personalities
The best way I have found of mitigating the risk of having your AWS credentials discovered and exploited is to use two sets of AWS credentials for managing your production environment. Let's call them your Secret Credentials and your Double Secret Credentials.

Secret Credentials
These are the credentials that would be used on the application servers in our diagram above. When creating these credentials, you should sign up only for the services that are absolutely required. In our example, that would include S3, SimpleDB and SQS but not, for example, EC2. You should make sure you choose a different, but equally secure password for the AWS Account Credentials (see Part 1).

Double Secret Credentials
The Double Secret Credentials are just that, doubly secret. These credentials should never, ever be present on a publicly accessible server! These are the credentials that you would use to start and stop all production EC2 instances and create and manage your Elastic Load Balancers and CloudWatch/AutoScaling groups. In addition, these are the credentials you would use to create the S3 buckets that contain your production data and to create any SQS queues you would need for batch processing. You would then use the Access Control Mechanisms of these services to grant the necessary access to the Secret Credentials.

For SQS, this is quite straightforward if you are using the new 2009-02-01 API. This API includes a powerful new ACL mechanism that gives you a great deal of flexibility in granting access to queues. For example, you could grant access to write to a queue but not read from it, read from it but not write to it, you can even limit access by IP address and/or time of day.

For S3, you have fewer options. If the application servers only require read access to S3 resources, then that can easily be accomplished with the ACL mechanism in S3 by granting READ access. If the app servers need to be able to write content to S3 (e.g. for uploading files) you would have to grant the Secret Credentials WRITE access to the S3 bucket. But that also means that all content written to the bucket by Secret Credentials would also be owned by Secret Credentials and could therefore also be read, deleted or overwritten by Secret Credentials.
Actually, that last sentence is not completely correct. In fact, if Secret Credentials has WRITE access (and only WRITE access) to a bucket owned by Double Secret Credentials, Secret Credentials will be able to read, delete or overwrite any content owned by Secret Credentials in that bucket AND will be able to delete or overwrite any content even if owned by Double Secret Credentials. It's sort of weird that keys can be deleted even if they cannot be read or listed but that is how S3 operates.
MSG - 6/25/2009 (Thanks to Allen on S3 Forums for the correction)

If that's unacceptable, the best approach is to have Secret Credentials write uploaded data to an intermediate bucket, send a message to indicate there is new content and then have Double Secret Credentials MOVE that content into the production S3 bucket.

For SimpleDB, you have even fewer options. There is currently no ACL capability in SimpleDB. So, if Secret Credentials needs any access to SimpleDB, it must own the SimpleDB domain and no other account can have direct access to it. The best approach I've found is to keep a reasonably current backup of the data in Secret Credential's SimpleDB domain(s), either by copying the data periodically to a domain owned by Double Secret Credentials or by dumping the data to an S3 bucket. Obviously, as the amount of SimpleDB data grows, this approach becomes less and less manageable. I am hoping that the flexible ACL mechanism recently introduced in SQS will eventually make it to SimpleDB but for now you have to resort to brute force safety measures.

Breathing Easier?
If we follow this basic strategy, most of the scorched earth scenarios described above can be avoided. There are still risks of unauthorized access to data and, depending on the data you are storing, you may need to consider further safeguards such as encryption of the data at rest as well as in flight. That introduces another level of complexity, especially around key management, but in some cases it's the only responsible approach.

The details of your security approach will depend on your application and the type of data you are storing but hopefully this installment provides a basic strategy for minimizing some of the risks described above. A future installment will describe some software tools that can be used to further automate and secure your application within AWS.

Managing Your AWS Credentials (Part 1)

Anyone who has deployed a production system in the Amazon Web Services infrastructure has grappled with the challenge of securing the application. The majority of the issues you face in an AWS deployment are the same issues you would face in deploying your application in any other environment, e.g.:
  • Hardening your servers by making sure you have the latest security patches for your OS and all relevant applications
  • Making sure your servers are running only the services required by your application
  • Reviewing file and directory ownership and permissions to minimize access to critical system files as much as possible
  • Configuring SSH to use non-standard ports and accept only public key authentication
  • Configure firewall rules to limit access to the smallest number of ports and CIDR blocks possible
That certainly isn't a comprehensive list but you can find plenty of information on securing servers and fortunately, since the servers you are running in the EC2 environment are standard servers, all of that information can be applied directly to securing your instances in AWS.

In fact, some of the tools provided by AWS such as the distributed firewall in EC2 can actually make the process even more secure because everything can be controlled via API's. So, for example, you can shut down SSH traffic completely at the EC2 firewall and write scripts that automatically enable SSH access for your current IP and then shut that port down as soon as you have closed your SSH session with the instance.

This series of articles focuses on an aspect of security that is very specific to AWS: managing your AWS credentials. In this first installment, let's start by reviewing the various credentials associated with your AWS account because this can be confusing for new users (and sometimes even for old users!).

AWS Account Credentials
These are the credentials you use to log into the AWS web portal and the AWS Management Console. This consists of an email address and a password. Since these credentials control access to all of the other credentials discussed below, it is very important to choose a strong password for this account and to age the password aggressively. I recommend using a service like random.org to generate 10-12 character random strings (longer is even better). Securing access to the portal should be your primary security goal.

AWS Account Number
This is the unique 12-digit number associated with your AWS account. Unlike the other credentials we will discuss, this one is not a secret. The easiest way to find your account number is to look in the upper-right corner of the web page after you have logged into the AWS portal. You should see something like this:



The Account Number is a public identifier and is used mainly for sharing resources within AWS. For example, if I create a AMI in EC2 and I want to share that AMI with a specific user without making the AMI public, I would need to add that user's Account Number to the list of user id's associated with the AMI (see this for details). One source of confusion here is that even though the Account Number is displayed with hyphens separating the three groups of four digits, when used via the API the hyphens must be removed.

Once you are logged into the AWS portal, you will see a page titled "Access Identifiers". There are really two types of Access Identifiers.

AccessKeyID and SecretAccessKey
These Access Identifiers are at the heart of all API access in AWS. Virtually every REST or Query API request made to every AWS service requires you to pass your AccessKeyID as part of the request to identify who you are. Then, to prove that you really are who you say you are, the API's also require to you compute and include a Signature in the request.

The Signature is calculated by concatenating a number of elements of the request (e.g. timestamp, request name, parameters, etc.) into a StringToSign and then creating a Signature by computing an HMAC of the StringToSign using your SecretAccessKey as the key (see this for more details on request signing).

When the request is received by AWS, the service concatenates the same StringToSign and then computes the HMAC based on the SecretAccessKey AWS has associated with the AccessKeyID sent in the request. If they match, the request is authenticated. If not, it is rejected.

The AccessKeyID associated with an account cannot be changed but the SecretAccessKey can be regenerated at any time using the AWS portal. Because the SecretAccessKey is the shared secret upon which the entire authentication mechanism is based, if there is any risk that your SecretAccessKey has been compromised you should regenerate it. In fact, it would be a good practice to age your SecretAccessKey in the same way you do the password in your AWS credentials. Just remember that once you change the SecretAccessKey, any applications that are making API calls will cease to function until their associated credentials are updated.

X.509 Certificate
The other Access Identifier associated with your account is the X.509 Certificate. You can provide your own certificate or you can have AWS generate a certificate for you. This certificate can be used for authenticating requests when using the SOAP versions of the AWS API's and it is also used when creating your own AMI's in EC2. Essentially, the files that are created when bundling an AMI are cryptographically signed using the X.509 cert associated with your account so if anyone were to try to tamper with the bundled AMI, the signature would be broken and easily detected.

When using the SOAP API's, the X.509 certificate is as critically important from a security point of view as the SecretAccessKey discussed above and should be managed just as carefully. Remember, even if you don't use SOAP, a hacker could!

SSH Keys
The final credential we need to discuss is the public/private keypair used for SSH access to an EC2 instance. By default, an EC2 instance will allow SSH access only by PublicKey authentication. I strongly recommend that you stick to this policy in any AMI's that you create for your own use. SSH keypairs can be generated via the AWS Console and API. You should create keypairs for each individual in your organization that will need access to running instances and guard those SSH keys carefully.

In fact, what I recommend is storing all of these critical credentials in an encrypted form on a USB memory stick and only on that device (and a backup copy of it to be safe, of course). You can either use a device that incorporates the encryption natively (e.g. IronKey, etc.) or you can create an encrypted disk image and store that on the USB device. Alternatively, you could just store the encrypted disk image itself on your laptop but never store these critical credentials in the clear on any computer or memory stick and definitely do not email them around or exchange them via IM, etc.

In Part 2 of this series (coming tomorrow!), I'll discuss a strategy for managing these important credentials in a production environment. Stay tuned!