Wednesday, September 30, 2009

Stupid Boto Tricks #1 - Cross-Region Scripting

Years ago, when I was working on DocuShare at Xerox, I used to set aside an hour or two once a week and, in the tradition of David Letterman's Stupid Pet Tricks, I used to come up with a Stupid DocuShare Trick and email it around to colleagues. The rules were simple.
  • The trick couldn't take more than an hour to actually implement
  • The trick had to demonstrate some unexpected capability of the system
  • The trick had to at least point in the direction of some actually useful capability, even though the trick in it's current form may not have been tremendously useful
It was actually quite fun and a couple of the tricks eventually evolved into truly useful features. So, I thought I would start something similar with boto.

The inspiration behind this Stupid Boto Trick was a thread on the SimpleDB forum. A user asked whether it was possible to get a listing of all SimpleDB domains across all AWS regions. The answer is "no" or at least "no, not directly". Each region has it's own set of service endpoints and you have to connect to a specific endpoint to issue requests. So, you would have to ask each SimpleDB endpoint for a list of domains and then combine the two lists on the client side.

To address this, I created a new Python class called a ServiceSet. A ServiceSet represents all of the endpoints for a particular Service and, when you access a particular attribute or method on the ServerSet, it actually performs the action on each of the endpoints of the service and then assembles the results for you. Here's the quick, dirty, undocumented code. Hey, like I said, I only get an hour at the most!

class ServiceSet(list):

def __init__(self, service, **kwargs):
self.service = service
self.regions = None
if self.service == 'ec2':
import boto.ec2
self.regions = boto.ec2.regions(**kwargs)
elif self.service == 'sdb':
import boto.sdb
self.regions = boto.sdb.regions(**kwargs)
elif self.service == 'sqs':
import boto.sqs
self.regions = boto.sqs.regions(**kwargs)
for region in self.regions:
self.append(region.connect(**kwargs))

def __getattr__(self, name):
results = []
is_callable = False
for conn in self:
try:
val = getattr(conn, name)
if callable(val):
is_callable = True
results.append(val)
except:
results.append(None)
if is_callable:
self.map_list = results
return self.map
return results

def map(self, *args):
results = []
for fn in self.map_list:
results.append(fn(*args))
return results

This implementation of the ServiceSet understands EC2, SQS and SimpleDB. S3 handles it's regions differently than the other services so we will leave that one out for now. Let's take it for a little spin around the block. First, let's create a ServerSet for SimpleDB:

>>> from serverset import ServerSet
>>> s = ServerSet('sdb')
>>> s
[SDBConnection:sdb.amazonaws.com, SDBConnection:sdb.eu-west-1.amazonaws.com]
>>>
So, we now have a ServerSet called s that contains connections to both endpoints for the SimpleDB service. Let's get a list of all of our domains, across both regions:

>>> s.get_all_domains()
[[Domain:foo,
Domain:test1248881005],
[Domain:bar]]
>>>
The results are returned as a list of lists although a slight modification to the ServerSet code would allow for a concatenated set. The nice thing is that the Domain objects within each of the lists knows about it's SDBConnection and will therefore always route Domain-specific methods to the right endpoint.

In addition to listing domains, you can also do other things. In fact, any method available on an SDBConnection object can also be invoked on the ServerSet which will, in turn, invoke the appropriate method on each of it's connections. Here's a transcript showing a bit of playing around with the ServerSet object:

>>> s.create_domain('serverset')
[Domain:serverset, Domain:serverset]
>>> s.put_attributes('serverset', 'testitem', {'foo' : 'bar', 'fie' : 'baz'})
[True, True]
>>> s.select('serverset', 'select * from serverset')
[[{u'fie': u'baz', u'foo': u'bar'}], [{u'fie': u'baz', u'foo': u'bar'}]]
>>> s.delete_domain('serverset')
[True, True]

So, there we have it. The first Stupid Boto Trick. I've created a Mercurial repo on bitbucket.org just to collect these tricks. You can access it at http://bitbucket.org/mitch/stupidbototricks/. You also need to be running at least r1306 of boto.

2 comments:

  1. That is good stuff. Similar to something I wrote recently in Java. It looks easier in Python, and I don't even know Python! :)

    I found that the next thing I needed to add (for my own use) was the ability to associate credentials with each server in the set. This allowed me to extend the set to multiple aws accounts as well as M/DB endpoints.

    ReplyDelete
  2. Supporting different credentials/endpoint would be a nice addition. I'll try to find a little time to make that change. Thanks, Mocky!

    ReplyDelete