Wednesday, September 30, 2009

Stupid Boto Tricks #1 - Cross-Region Scripting

Years ago, when I was working on DocuShare at Xerox, I used to set aside an hour or two once a week and, in the tradition of David Letterman's Stupid Pet Tricks, I used to come up with a Stupid DocuShare Trick and email it around to colleagues. The rules were simple.
  • The trick couldn't take more than an hour to actually implement
  • The trick had to demonstrate some unexpected capability of the system
  • The trick had to at least point in the direction of some actually useful capability, even though the trick in it's current form may not have been tremendously useful
It was actually quite fun and a couple of the tricks eventually evolved into truly useful features. So, I thought I would start something similar with boto.

The inspiration behind this Stupid Boto Trick was a thread on the SimpleDB forum. A user asked whether it was possible to get a listing of all SimpleDB domains across all AWS regions. The answer is "no" or at least "no, not directly". Each region has it's own set of service endpoints and you have to connect to a specific endpoint to issue requests. So, you would have to ask each SimpleDB endpoint for a list of domains and then combine the two lists on the client side.

To address this, I created a new Python class called a ServiceSet. A ServiceSet represents all of the endpoints for a particular Service and, when you access a particular attribute or method on the ServerSet, it actually performs the action on each of the endpoints of the service and then assembles the results for you. Here's the quick, dirty, undocumented code. Hey, like I said, I only get an hour at the most!

class ServiceSet(list):

def __init__(self, service, **kwargs):
self.service = service
self.regions = None
if self.service == 'ec2':
import boto.ec2
self.regions = boto.ec2.regions(**kwargs)
elif self.service == 'sdb':
import boto.sdb
self.regions = boto.sdb.regions(**kwargs)
elif self.service == 'sqs':
import boto.sqs
self.regions = boto.sqs.regions(**kwargs)
for region in self.regions:
self.append(region.connect(**kwargs))

def __getattr__(self, name):
results = []
is_callable = False
for conn in self:
try:
val = getattr(conn, name)
if callable(val):
is_callable = True
results.append(val)
except:
results.append(None)
if is_callable:
self.map_list = results
return self.map
return results

def map(self, *args):
results = []
for fn in self.map_list:
results.append(fn(*args))
return results

This implementation of the ServiceSet understands EC2, SQS and SimpleDB. S3 handles it's regions differently than the other services so we will leave that one out for now. Let's take it for a little spin around the block. First, let's create a ServerSet for SimpleDB:

>>> from serverset import ServerSet
>>> s = ServerSet('sdb')
>>> s
[SDBConnection:sdb.amazonaws.com, SDBConnection:sdb.eu-west-1.amazonaws.com]
>>>
So, we now have a ServerSet called s that contains connections to both endpoints for the SimpleDB service. Let's get a list of all of our domains, across both regions:

>>> s.get_all_domains()
[[Domain:foo,
Domain:test1248881005],
[Domain:bar]]
>>>
The results are returned as a list of lists although a slight modification to the ServerSet code would allow for a concatenated set. The nice thing is that the Domain objects within each of the lists knows about it's SDBConnection and will therefore always route Domain-specific methods to the right endpoint.

In addition to listing domains, you can also do other things. In fact, any method available on an SDBConnection object can also be invoked on the ServerSet which will, in turn, invoke the appropriate method on each of it's connections. Here's a transcript showing a bit of playing around with the ServerSet object:

>>> s.create_domain('serverset')
[Domain:serverset, Domain:serverset]
>>> s.put_attributes('serverset', 'testitem', {'foo' : 'bar', 'fie' : 'baz'})
[True, True]
>>> s.select('serverset', 'select * from serverset')
[[{u'fie': u'baz', u'foo': u'bar'}], [{u'fie': u'baz', u'foo': u'bar'}]]
>>> s.delete_domain('serverset')
[True, True]

So, there we have it. The first Stupid Boto Trick. I've created a Mercurial repo on bitbucket.org just to collect these tricks. You can access it at http://bitbucket.org/mitch/stupidbototricks/. You also need to be running at least r1306 of boto.

Monday, September 28, 2009

The Complexities of Simple

Back in the early, halcyon days of Cloud Computing there was really only one game in town; Amazon Web Services. Whether by luck or cunning, Amazon got a big, hairy head start on everyone else and so if you wanted Cloud-based storage, queues or computation you used AWS and life was, well, simple.

But now that we clearly have a full-blown trend on our hands there are more choices. The good folks from Rackspace picked up on the whole Cloud-thing early on and have leveraged their expertise in more traditional colo and managed servers to bring some very compelling offerings to market. Google, after their initial knee-jerk reaction of trying to give everything away has decided that what they have might be worth paying for and is actually charging people. And Microsoft, always a late riser, has finally rubbed the sleep dirt out of their eyes and finished their second cup of coffee and is getting serious about this cloud stuff. It's clear that this is going to be a big market and there will be lots of competitors.

So, we have choices. Which is good. But it also makes things more complicated. Several efforts are now under way to bring simplicity back in the form of unifying API's or REST interfaces that promise a Rosetta stone-like ability to let your applications speak to all of the different services out there without having to learn all of those different dialects. Sounds good, right?

Well, it turns out that making things simple is far more complicated than most people realize. For one thing, the sheer number of things that need to be unified is still growing rapidly. Just over the past year or so, Amazon alone has introduced:
  • Static IP addresses (via EIP)
  • Persistent block storage (EBS)
  • Load balancing
  • Auto scaling
  • Monitoring
  • Virtual Private Clouds
And that's just one offering from one company. It's clear that we have not yet fully identified the complete taxonomy of Cloud Computing. Trying to identify a unifying abstraction layer on top of this rapidly shifting sand is an exercise in futility.

But even if we look at an area within this world that seems simpler and more mature, e.g. storage, the task of simplifying is actually still quite complex. As an exercise, let's compare two quite similar services; S3 from AWS and Cloud Files from Rackspace.

S3 has buckets and keys. Cloud Files has containers and objects. Both services support objects up to 5GB in size. So far, so good. S3, however, has a fairly robust ACL mechanism that allows you to grant certain permissions to certain users or groups. At the moment, Cloud Files does not support ACL's.

Even more interesting is that when you perform a GET on a container in Cloud Files, the response includes the content-type for each object within the container. However, when you perform a GET on a bucket in S3, the response does not contain the content-type of each key. You need to do a GET on the key itself to get this type of meta data.

So, if you are designing an API to unify these two similar services you will face some challenges and will probably end up with a least common denominator approach. As a user of the unifying API, you will also face challenges. Should I rely on the least common denominator capabilities or should I actually leverage the full capabilities of the underlying service? Should the API hide differences in implementations (e.g. the content-type mentioned above) even if it creates inefficiencies? Or should it expose those differences and let the developer decide? But if it does that, how is it really helping?

I understand the motivation behind most of the unification efforts. People are worried about lock-in. And there are precedents within the technology world where unifying API's have been genuinely useful, e.g. JDBC, LDAP, etc. The difference is, I think, timing. The underlying technologies were mature and lots of sorting out had already occurred in the industry. We are not yet at that point in this technology cycle and I think these unification efforts are premature and will prove largely ineffective.

Thursday, September 24, 2009

Support for Shared Snapshots added to boto

The AWS juggernaut continues. In the wee hours of the morning, a new Shared Snapshot feature was announced. This handy new feature allows you to share EBS snapshots in the same way you can share an AMI already. So, I can now give any other AWS user the permission to create a new EBS volume from one of my existing EBS snaphots. And, just as I can make one of my AMI's public (allowing anyone to launch it) I can also choose to make one of my EBS snapshots public, allowing any AWS user to create a volume from it.

Jeff Barr's blog entry described some use cases for this new capability and Shlomo Swidler provides quite a few more. Just the ability to share EBS volumes across dev/test/production makes this a killer feature for me but there are many, many more use cases. The first step, though, is to add support for the new features in boto and I'm pleased to announce that as of r1298 the latest subversion code does just that. I'll be packaging up a new release soon, but until then I encourage you to give the subversion HEAD a try.

Here are a few examples to get you started.

First, let's create a new snapshot of an existing EBS volume:


>>> import boto
>>> ec2 = c.connect_ec2()
>>> rs = ec2.get_all_volumes()
>>>


At this point, the variable rs is a list of all of my current EBS volumes. For this example, I'm just going to use the first volume in that list:


>>> vol = rs[0]
>>> snap = vol.create_snapshot('This is my test snapshot for the blog')
>>>


This first thing to notice here is that AWS has snuck in another nice feature. You can now provide a description for a snapshot. Very nice! That could definitely be handy in helping to organize and identify the snapshot you are looking for. Having created the snapshot, let's now share it with another AWS user:


>>> snap.share(user_ids=['963068290131'])
True
>>>


I could also share this with everyone:


>>> snap.share(groups=['all'])
True
>>>


I could also decide that I no longer want the snapshot shared with everyone and remove that permission:


>>> snap.unshare(groups=['all'])
True
>>>


And to find out what the current permissions are for a snapshot, I can do this:


>>> snap.get_permissions()
{'user_ids': [u'963068290131']}
>>>

That should be enough to get you started. The API documentation has been updated although a few more updates are needed. BTW, if you haven't checked out the online docs recently you should. Patrick Altman converted all of the existing docs over to use Sphinx and the results are very, very nice. Thanks, Patrick.

Friday, September 4, 2009

Looking for a few good Boto developers

Hi -

One of the main goals of the boto project is to support all Amazon services and to keep that support current as AWS releases new versions of the services. As the number of services grow, and the pace of development at AWS increases, that becomes a challenge: at some point I will simply be unable to keep up! To meet that challenge, I would like to solicit help from the boto community.

I'm interested in finding people who would be willing to take ownership of specific boto modules (e.g. S3, SQS, ELB, etc.). There are two possible scenarios:

  • You could take responsibility for an existing boto module. This would mean addressing issues in the module as well as improving the module. In particular, boto 2.0 will be a major upgrade and may involve significant, even incompatible, changes in existing modules. As the owner of a module, you would be responsible for proposing changes, responding to comments, building consensus and ultimately implementing the changes. In practice, I think that a prerequisite for taking ownership of a module would be that you are a heavy user of the module.
  • You could express interest in developing new boto modules. We have a strong relationship with AWS and are usually briefed on upcoming services prior to their public announcement. Participating in AWS private alphas and betas is a fun experience and gives you direct input into the services and API's. Participating in this way would require you to sign, and more importantly, to honor a very strict confidentiality agreement with AWS. We can help facilitate this process with AWS.

In addition to these two scenarios, I'm also interested in establish a community of core developers and contributors to boto. As I mentioned before, the 2.0 release will be a major release and everything is on the table. I have a lot of ideas involving refactoring of existing code and also support for services beyond AWS. I would love to get more feedback and more ideas from the community around this release.

If you are interested in getting more involved in boto, please contact me directly; mitch.garnaat at gmail dot com.

Thanks,

Mitch