Tuesday, April 20, 2010

Failure as a Feature

One need only peruse the EC2 forums a bit to realize that EC2 instances fail.  Shock.  Horror.  Servers failing?  What kind of crappy service is this, anyway.  The truth, of course, is that all servers can and eventually will fail.  EC2 instances, Rackspace CloudServers, GoGrid servers, Terremark virtual machines, even that trusty Sun box sitting in your colo.  They all can fail and therefore they all will fail eventually.

What's wonderful and transformative about running your applications in public clouds like EC2 and CloudServers, etc. is not that the servers never fail but that when they do fail you can actually do something about it.  Quickly.  And programmatically.  From an operations point of view, the killer feature of the cloud is the API.  Using the API's, I can not only detect that there is a problem with a server but I can actually correct it.  As easily as I can start a server, I can stop one and replace it with a new one.

Now, to do this effectively I really need to think about my application and my deployment differently.  When you have physical servers in a colo failure of a server is, well, failure.  It's something to be dreaded.  Something that you worry about.  Something that usually requires money and trips to the data center to fix.

But for apps deployed on the cloud, failure is a feature.  Seriously.  Knowing that any server can fail at any time and knowing that I can detect that and correct that programmatically actually allows me to design better apps.  More reliable apps.  More resilient and robust apps.  Apps that are designed to keep running with nary a blip when an individual server goes belly up.

Trust me.  Failure is a feature.  Embrace it.  If you don't understand that, you don't understand the cloud.

Monday, April 19, 2010

Subscribing an SQS queue to an SNS topic

The new Simple Notification Service from AWS offers a very simple and scalable publish/subscribe service for notifications.  The basic idea behind SNS is simple.  You can create a topic.  Then, you can subscribe any number of subscribers to this topic.  Finally, you can publish data to the topic and each subscriber will be notified about the new data that has been published.

Currently, the notification mechanism supports email, http(s) and SQS.  The SQS support is attractive because it means you can subscribe an existing SQS queue to a topic in SNS and every time information is published to that topic, a new message will be posted to SQS.  That allows you to easily persist the notifications so that they could be logged or further processed at a later time.

Subscribing via the email protocol is very straightforward.  You just provide an email address and SNS will send an email message to the address each time information is published to the topic (actually there is a confirmation step that happens first, also via email).  Subscribing via HTTP(s) is also easy, you just provide the URL you want SNS to use and then each time information is published to the topic, SNS will POST a JSON payload containing the new information to your URL.

Subscribing an SQS queue, however, is a bit trickier.  First, you have to be able to construct the ARN (Amazon Resource Name) of the SQS queue.  Secondly, after subscribing the queue you have to set the ACL policy of the queue to allow SNS to send messages to the queue.

To make it easier, I added a new convenience method in the boto SNS module called subscribe_sqs_queue.  You pass it the ARN of the SNS topic and the boto Queue object representing the queue and it does all of the hard work for you.  You would call the method like this:

>>> import boto
>>> sns = boto.connect_sns()
>>> sqs = boto.connect_sqs()
>>> queue = sqs.lookup('TestSNSNotification')
>>> resp = sns.create_topic('TestSQSTopic')
>>> print resp

{u'CreateTopicResponse': {u'CreateTopicResult': {u'TopicArn': u'arn:aws:sns:us-east-1:963068290131:TestSQSTopic'},
                          u'ResponseMetadata': {u'RequestId': u'1b0462af-4c24-11df-85e6-1f98aa81cd11'}}}
>>> sns.subscribe_sqs_queue('arn:aws:sns:us-east-1:963068290131:TestSQSTopic', queue)


That should be all you have to do to subscribe your SQS queue to an SNS topic.  The basic operations performed are:

  1. Construct the ARN for the SQS queue.  In our example the URL for the queue is https://queue.amazonaws.com/963068290131/TestSNSNotification but the ARN would be "arn:aws:sqs:us-east-1:963068290131:TestSNSNotification"
  2. Subscribe the SQS queue to the SNS topic
  3. Construct a JSON policy that grants permission to SNS to perform a SendMessage operation on the queue.   See below for an example of the JSON policy.
  4. Associate the new policy with the SQS queue by calling the set_attribute method of the Queue object with an attribute name of "Policy" and the attribute value being the JSON policy.

The actual policy looks like this:

{"Version": "2008-10-17", "Statement": [{"Resource": "arn:aws:sqs:us-east-1:963068290131:TestSNSNotification", "Effect": "Allow", "Sid": "ad279892-1597-46f8-922c-eb2b545a14a8", "Action": "SQS:SendMessage", "Condition": {"StringLike": {"aws:SourceArn": "arn:aws:sns:us-east-1:963068290131:TestSQSTopic"}}, "Principal": {"AWS": "*"}}]}


The new subscribe_sqs_queue method is available in the current SVN trunk.  Check it out and let me know if you run into any problems or have any questions.