Sunday, June 13, 2010

Using Reduced Redundancy Storage (RRS) in S3

This is just a quick blog post to provide a few examples of using the new Reduced Redundancy Storage (RRS) feature of S3 in boto.  This new storage class in S3 gives you the option to tradeoff redundancy for cost.  The normal S3 service (and corresponding pricing) is based on a 12-nines 11 nines (yes, that's 99.999999999% - Thanks to Jeff Barr for correction in comments below) level of durability.  In order to achieve this extremely highly level of reliability, the S3 service must incorporate a high-level of redundancy.  In other words, it keeps many copies of your data in many different locations so that even if multiple locations encounter failures, your data will still be safe.

That's a great feature but not everyone needs that level of redundancy.  If you already have copies of your data locally and are just using S3 as a convenient place to store data that is actively being accessed by services within the AWS infrastructure, RRS may be for you.  It provides a much lower level of durability (99.99%) at a significantly lower cost.  If that fits the bill for you, the next three code snippets will provide you with the basics you need to start using RRS in boto.

Create a New S3 Key Using the RRS Storage Class

Convert An Existing S3 Key from Standard Storage Class to RRS

Create a Copy of an Existing S3 Key Using RRS

Friday, June 4, 2010

AWS By The Numbers

I recently gave a short talk about Amazon Web Services at GlueCon 2010.  It was part of a panel discussion called "Major Platform Providers" and included similar short talks from others about Azure, and vCloud.  It's very hard (i.e. impossible) to give a meaningful technical overview of AWS in 10 minutes so I struggled a bit trying to decide what to talk about.  In the end, I decided to try to come up with some quantitative data to describe Amazon Web Services.  My goal was to try to show that AWS is:

  • A first mover - AWS introduced their first web services in 2005
  • A broad offering - 13 services currently available
  • Popular - details of how I measure that described below
  • Prolific - the pace of innovation from AWS is impressive

After the conference, I was going to post my slides but I realized they didn't really work that well on their own so I decided instead to turn the slides into a blog post.  That gives me the opportunity to explain the data and resulting graphs in more detail and also allows me to provide the graphs in a more interactive form.

Data?  What data?

The first challenge in trying to do a data-heavy talk about AWS is actually finding some data.  Most of the data that I would really like to have (e.g. # users,  # requests, etc.) is not available.  So, I needed to find some publicly available data that could provide some useful insight.  Here's what I came up with:

  • Forum data - I scraped the AWS developer forums and grabbed lots of useful info.  I use things like forum views, number of messages and threads, etc. to act as a proxy for service popularity.  It's not perfect by any means, but it's the best I could come up with.
  • AWS press releases - I analyzed press releases from 2005 to the present day and use that to populate a spreadsheet of significant service and feature releases.
  • API WSDL's - I parsed the WSDL for each of the services to gather data about API complexity.
With that background, let's get on to the data.

Service Introduction and Popularity

This first graph uses data scraped from the forums.  Each line in the graph represents one service and the Y axis is the total number of messages in that services forum for the given month.  The idea is that the volume of messages on a forum should have some relationship to the number of people using the service and, therefore, the popularity of the service.  Following the timeline across also shows the date of introduction for each of the services.

Note: If you have trouble loading the following graph, try going directly to the Google Docs spreadsheet which I have shared.

The following graph shows another, simpler view of the forum data.  This view plots the average number of views on the forum for each service normalized.

API Complexity

Another piece of publicly available data for AWS is the WSDL for each service.  The WSDL is an XML document that describes the operations supported by the service and the data types used by the operations.  The following graph shows the API Complexity (measured as the number of operations) for each of the services.


Finally, I wanted to try to measure the pace of innovation by AWS.  To do this, I used the spreadsheet I created that tracked all significant service and feature announcements by AWS.  I then counted the number of events per quarter for AWS and used that to compute an agile-style velocity.


Hopefully these graphs are interesting and help to prove the points that I outlined at the beginning of the talk.  I actually have a lot more data available from the forum scrapping and may try to mine that in different ways later.

While this data was all about AWS, I think the bigger point is that the level of interest and innovation in Amazon's services is really just an indicator of a trend across the cloud computing market.