Wednesday, February 24, 2010

Pick Your SimpleDB Flavor: AP or CP?

Back around 2000, a fellow named Eric Brewer posited something called the CAP theorem.  The basic tenants of this theorem are that in the world of shared data, distributed computing there are three basic properties; data consistency, system availability and tolerance to network partitioning, and only 2 of the 3 properties can be achieved at any given time (see Werner Vogel's article or this paper for more details on CAP).

SimpleDB is a great service from AWS that provides a fast, scalable metadata store that I find useful in many different systems and applications.  When viewed through the prism of the CAP theorem, SimpleDB provides system availability (A) and tolerance to network partitioning (P) at the expense of consistency (C).  So, as a AP system it means users have to understand and deal with the lack of consistency or "eventual consistency".  For many types of systems, this lack of consistency is not a problem and given that the vast majority of writes to SimpleDB are consistent in a short period of time (most in less than a second) it's not a big deal.

But what happens if you really do need consistency?  For example, let's say you want to store a user's session state in SimpleDB.  Each time the user makes another request on your web site you will want to pull their saved session data from the database.  But if that state is not guaranteed to be the most current data written it will cause problems for your user.  Or you may have a requirement to implement an incrementing counter.   Without consistency, such a requirement would be impossible.  Which would mean that using SimpleDB for those types of applications would be out of the question.  Until now...

Pick Your Flavor


SimpleDB now provides a new set of API requests that let you perform reads and writes in a consistent manner (see this for details).  For example, I can now look up an item in SimpleDB or perform a search and specify that I want the results to be consistent.  By specifying a consistent flag in these requests, SimpleDB will guarantee that the results returned will be consistent with all write operations received by the SimpleDB prior to the read or query request.

Similarly, you can create or update a value of an object in SimpleDB and provide with the request information about what you expect the current value of that object to be.  If your expected values differ from the actual values currently stored in SimpleDB, an exception will be raised and the value will not be updated.

Of course, nothing is free.  By insisting on Consistency, the CAP theorem says that we must be giving up on one of the other properties.  In this case, we are giving up on is Availability.  Basically, if we want the system to give us consistent data then it simply won't be able to respond as quickly as before.  It will have to wait until it knows the state is consistent and while it is waiting, the system is unavailable to your application.  Of course, that's exactly how every relational database you have ever used works so that should be no surprise.  But if performance and availability are your main goals, you should use these Consistency features sparingly.

Give It A Try

The boto subversion repository has already been updated with code that supports these new consistency features.  The API changes are actually quite small; a new, optional consistent_read parameter to methods like get_attributes and select and a new, optional expected_values parameter to methods like put_attributes and delete_attributes.  I'll be posting some example code here soon.

7 comments:

  1. What about making this be able to pull from the boto.cfg instead of having to add the parameter to all of these calls? If you could just turn on "consistent mode" in the config that may simplify things greatly.

    ReplyDelete
  2. Do you really want to run in consistent mode all the time? I guess we could provide a default value that can be over-ridden either by passing in a param to the constructor or by querying the value of an attribute in the boto config. It would be easy to add that.

    ReplyDelete
  3. Did you see this post?

    http://www.elastician.com/2010/02/stupid-boto-tricks-2-reliable-counters.html

    ReplyDelete
  4. When will the boto library with the new API be released?

    ReplyDelete
  5. The code is available in the subversion repository and has been for quite a while. I will try to put a new release out within a week.

    ReplyDelete
  6. This is a great feature. I'm looking forward to the new boto library release!

    ReplyDelete