Elastician: 2011

Friday, December 16, 2011

Looking at Clouds from Both Sides Now

I'll apologize up front for that horrible pun in the title. No excuse, really.

After 18 months at Eucalyptus, the best private cloud vendor out there, I have decided to see what things are like on the public cloud side. As of Monday, December 19, I will be a senior engineer at Amazon Web Services.

I was very reluctant to leave Eucalyptus. It is a great company, full of great people and with a corporate culture that absolutely cannot be beat. And, while a lot of people's attention has been focused on shiny new things over the past year, Eucalyptus has quietly and steadily built amazing sales, support, marketing and professional services teams to match their already awesome engineering team. 2012 is going to be another kick-ass year for Eucalyptus and I really hate to miss that.

But the idea of seeing how the sausage is made at the biggest public cloud is an opportunity I couldn't pass up. In my new job, I will still be focusing on software tools and how to make it easier for developer's to use cloud infrastructures, both public and private. I will still be doing a lot of Python stuff and definitely still making sure that boto stays a popular, useful and independent open source project just as it did while I was at Eucalyptus.

It should be fun!

Wednesday, December 7, 2011

Don't reboot me, bro!

If you are an AWS user with EC2 instances running, you may have already gotten an email from AWS informing you that your instance(s) will be rebooted in the near future. I'm not exactly sure what is prompting this massive rebooting binge but the good folks at AWS have actually provided a new EC2 API request just so you can find out about upcoming maintenance events planned for your instances.

We just committed code to boto that adds support for the new DescribeInstanceStatus request. Using this, you can programmatically query the status of any or all of your EC2 instances and find out if there is a reboot in their future and, if so, when to expect it.

Here's an example of using the new method and accessing the data returned by it.

Sunday, November 13, 2011

Mapping Requests to EC2 API Versions

I recently did some analysis of the EC2 API. I wanted to look at the API over time so I could remember which API requests were added in each of the 23 separate versions of the API over the past 5 years. The results were kind of interesting and I thought it would be worthwhile to share them here.

The following image shows a graph of the number of requests over time. If you click on the image, you will see a high-res PNG version of the information that lets you zoom in to get much greater detail. The reddish color section of each of the bars in the bar graph actually contain the names of the individual requests added in each version but those are really only readable in the high-res version of the graphic.

Note that this analysis is only looking at the request level. I'm not diving deeper to look at the individual parameters in each requests which, in some cases, have also changed over time. I may do that analysis at some point but it's a huge amount of work and I doubt that I'll find the time.

The raw JSON data behind this can be found in the missingcloud github repo.

Monday, October 31, 2011

Python and AWS Cookbook Available

I recently completed a short book for O'Reilly called "Python and AWS Cookbook". It's a collection of recipes for solving common problems in Amazon Web Services. The solutions are all in Python and, of course, use boto heavily. The focus of this book is EC2 and S3 although there are a couple of quick detours into IAM and SNS. Many of the examples also work with Eucalyptus so I have included some information about using boto with Eucalyptus as well as with Google Cloud Storage for some of the S3-related recipes.

You can get a hardcopy of the book but if you buy the e-book, you get free updates and I am expecting to do quite a few updates. Many of the recipes came from problems people have posted on the boto users group or on the boto IRC channel but I'm sure there are lots of other areas where additional example code would be useful. If you have specific requests, let me know. Depending on the response, I might also do additional cookbooks that focus on other services.

The bird on the cover is a Sand Grouse. I lobbied heavily for a Honey Badger but to no avail.

Friday, October 14, 2011

Does Python Scale?

I wonder how many times I've been asked that question over the years. Often, it's not even in the form of a question (Sorry, Mr. Trebek) but rather stated emphatically; "Python doesn't scale". This can be the start of long, heated discussions involving Global Interpreter Locks, interpreters vs. compilers, dynamic vs. static typing, etc. These discussions rarely end satisfactorily for any of the parties involved. And rarely are any opinions changed as a result.

So, does Python scale?

Well, YouTube is written mostly in Python. DropBox is written almost entirely in Python. Reddit. Quora. Disqus. FriendFeed. These are huge sites, handling gazillions of hits a day. They are written in Python. Therefore, Python scales.

Yeah, but what about that web app I wrote that one time. Hosted on a cheapo, oversubscribed VPS, running straight CGI talking to a remote MySQL database running in a virtual machine on my Macbook Air. That thing fell over like a drunken sailor when I invited a few of my friends to go check it out. So, yeah. Forget what I said before. Obviously Python doesn't scale.

The truth is, it's the wrong question. The stuff that allows Dropbox to store a million files every 15 minutes has little to do with Python just as the things that caused my feeble web app to fail had little to do with Python. It has to do with the overall architecture of the application. How databases are sharded, how loosely or tightly components have been coupled, how you monitor, and how you react to the data your monitoring is providing you. And lots of other stuff. But you have to deal with those issues no matter what language you write the system in.

No reasonable choice of computer language is going to guarantee your success or your failure. So pick the one you are most productive in and focus on properly architecting your app. That scales.

Thursday, October 13, 2011

Accessing the Eucalyptus Community Cloud with boto

The Eucalyptus Community Cloud (ECC) is a great resource that allows you to try out a real cloud computing system without installing any software or incurring any costs. It's a sandbox environment that is maintained by Eucalyptus Systems to allow people to testdrive Eucalyptus software and experiment with cloud computing.

To access the ECC, you need to sign up following the instructions here. Once you are signed up, you will be able to download a zip file containing the necessary credentials for accessing the ECC. If you unzip that file somewhere on your local filesystem you will find, among other things, a file called eucarc. The contents of that file will look something like this:

To get things to work seamlessly in boto, you need to copy a few pieces of information from the eucarc file to your boto config file, which is normally found in ~/.boto. Here's the info you need to add. The actual values, of course, should be the ones from your own eucarc file.

Notice that the values needed for eucalyptus_host and walrus_host are just the hostname or ip address of the server as specified in the EC2_HOST and S3_HOST variables. You don't have to include the port number or the http prefix. Having edited your boto config file, you can now easily access the ECC services in boto.

This example assumes you are using the latest version of boto from github or the release candidate for version 2.1 of boto.

Tuesday, February 22, 2011

Accessing the Internet Archive with boto

A recent tweet from Pete Skomoroch twigged me to the fact that the Internet Archive provides an S3-like API. Cool! The Internet Archive is a great resource which provides, in their words:

...a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public.

Since boto supports S3 I wondered if it would be possible to access the Internet Archive's API with boto. Turns out, it's quite simple. To make it even simpler, I've added a new "connect_ia" method. Before you can use this, you need to get API credentials from the Internet Archive but fortunately that's really easy. Just sign up for an account (if you don't already have one) and then go to this link to generate the API keys.

Once you have your credentials, the easiest thing to do is to add the credentials to your boto config file. They need to go in the Credentials section like this:

Then, you can create a connection to the Internet Archive like this:

I've only tested this a bit so if you run into any problems with it, post a message to boto-users or create an issue.

Thursday, February 17, 2011

All Your Website Are Belong to S3

One of the most commonly requested features for S3 has been the ability to have it act more like a web server. In other words, to be able to put an index.html file into a bucket and then point someone to that bucket and they see your website. I found requests for this on the S3 forum dating back to June 2006. I'm pretty sure if you search around in the forums long enough you will see posts from me predicting S3 would never have this feature.

Well, as is so often the case, I have been proven wrong. AWS has just announced a new feature of S3 that lets you easily host static websites entirely on S3. The features are pretty simple to use. The basic process is:

Create a bucket to hold your website (or use an existing one)
Make sure the bucket is readable by the world
Upload your website content including the default page (usually index.html) and a optional page to display in case of errors
Configure your bucket for use as a website (using a new API call)
Access your website via the new hostname S3 provides for website viewing. You can also create CNAME aliases, etc. to map the bucket name to your own domain name

The following Python code provides an example of all of the above steps.

I could now access my website using the following special link:

http://garnaat-website-2.s3-website-us-west-1.amazonaws.com/

I could also use the CNAME aliasing features of S3 to map my S3 website to my own domain (which is probably what most people will want to do). It's a great new feature for S3 and something that should prove useful to a lot of people.

Thursday, January 27, 2011

It Takes a Village...

There are a lot of reasons someone might want to start an open source software project. You may be motivated by idealistic notions like freedom (as in both "free speech" and "free beer"), contribution to a community, and the desire for higher quality software due to the many eyes that scrutinize the code. Or, you could be motivated by more base desires like reputation, influence and the potential for paid work. Whatever the motivation though, it's clear that to truly achieve any of these goals you need people to take notice. You need users. You need a community.

Another thing that has become clear to me over the past six months or so is that one of the best ways to build a community for an open source project is to host it on github.com. I'm not really sure why this is true. The underlying git DVCS system is certainly very powerful and efficient but it also can be cryptic and unintuitive at times. It could be the very distributed nature of git and github but there are other DVCS out there with hosted, centralized master repos. They are all good but they don't seem to be as good at motivating people to participate as github. Maybe github have just hit on just the right combination of power, flexibility and gee-whiz GUI. Whatever the reason, the results for the boto project have been pretty amazing so far.

In the six months we have been on github.com (our first commit there was on July 12th, 2010), we have:

290 people watching the boto repository.
66 people who have forked, or copied, the repository to allow them to experiment on their own.
61 pull requests, which are the culmination of those experiments in forked repositories. They are basically people asking to have their local modifications merged with the main boto repository. Thus far 50 have been closed, 11 are still open.
340 commits to the repository by 35 different contributors. That's about 1.7 commits per day. Commits have ranged from single line typo fixes to entire new boto modules.
Major contributions from Google in support of their Google Storage service.
11495 downloads of packaged boto releases from our Google code project page.
35978 downloads of just the 2.0b3 packaged release from pypi.python.org, the Python package index
42,420 visits (124,278 page views) by 10,734 unique visitors to our Google project page and 13,620 views of our github project page in the last 90 days.
I received three, count them three, unsolicited contributions for a boto module to support the new Simple Email Service from AWS within 24 hours of the services announcement.

I'm not suggesting that all of this is due to github. It certainly helps that boto provides an interface to a very popular set of cloud-based services in a very popular programming language. But github has clearly been a factor in building the community and increasing contributions.

So, thanks github and thanks to the boto community. I also want to take this opportunity to thank the my colleagues and the management team at Eucalyptus Systems for supporting me in my efforts to support the boto community. It really underscores their commitment to open source software.