Elastician: April 2009

There seems to be a lot of angst about the risks of lock-in with cloud computing. I think there are some real issues to be concerned about but most of the discussion seems to be centered around API's and I think that's wrong.

First, let's define what we mean by lock-in. This description from wikipedia provides a good, workable definition:

In economics, vendor lock-in, also known as proprietary lock-in, or customer lock-in, makes a customer dependent on a vendor for products and services, unable to use another vendor without substantial switching costs. Lock-in costs which create barriers to market entry may result in antitrust action against a monopoly.

In the world of software development and IT, examples of things that have caused lock-in headaches would be:

Using proprietary operating system features
Using proprietary database features
Using hardware which can only run a proprietary OS

So, a typical scenario might be that you have a requirement to develop and deploy some software for internal use within your company. You do the due diligence and make your choices in terms of the hardware you will buy, the OS you will use, the database, etc. and you build and deploy your application.

But in the process of doing that, you have used features in the operating system or database or other component that are unique to that vendor's product. There may be have been good reasons at the time to do so (e.g. better performance, better integration with development tools, better manageability, etc.) but because of those decisions the cost of changing any of the significant components of that software becomes too high to be practical. You are locked-in.

In that scenario, the root cause of the lock-in problem seems to be the use of proprietary API's in the development of the application so it kind of makes sense that the focus of concern in Cloud Computing Lock-In would also be the API's. Here's why I don't think it's different for the Cloud Service case:

Sunk Cost - While the use of proprietary API's in the above example represent one barrier to change, a more significant barrier is actually the sunk cost of the current solution. In order to deploy that solution internally, a lot of money had to be spent upfront (e.g. hardware, OS server and client licenses, database licenses). To move the solution off the locked-in platform not only involves considerable re-write of the software (OpEx costs) but also new CapEx expenses and potential write-down of current capital. In the case of a Cloud Service, these sunk costs aren't a factor. The hardware and even the licensing costs for software can be paid by the hour. When you turn that server off, your costs go to zero.
Tight Coupling vs. Loose Coupling - Even if you focus only on the API's and the rework necessary to move the solution to a different platform, the fact that Cloud Computing services focus on REST and other HTTP-based API's dramatically changes the scope of the rework when compared to moving from one low-level tighly-coupled API to another one. By definition, your code that interacts with Cloud Services will be more abstracted and loosely-coupled which will make it much easier to get it working with another vendor's Cloud Service.

To see what the real lock-in concern is with Cloud Services, think about where the real pain would be in migrating a large application or service from one vendor to another. For most people, that pain will be around the data associated with that application or service. Rather than sitting in your data center, it now sits in that vendors cloud service and moving that, especially for large quantities of data will present a real barrier.

So, how do you mitigate that concern? Well, you could try to keep local backups of all data stored in a service like S3 but for large quantities of data that becomes impractical and diminishes the value proposition for a data storage service in the first place. The best approach is to demand that your Cloud Service vendors provide mechanisms to get large quantities of data in and out of their services via some sort of bulk load service.

Amazon doesn't yet offer such a service but I was encourage by this thread on their S3 forum which suggests that AWS is at least thinking about the possibility of such a service. I encourage them and other Cloud Services vendors like Rackspace/Mosso to make it as easy as possible to get data in AND out of your services. That's the best way to minimize concerns about vendor lock-in.

One of the great things about Amazon's EC2 service is the ability to scale up and scale down quickly. This elasticity really brings a whole new dimension to computing but one of the common criticisms on the forums early on was that if you didn't really need that elasticity, EC2 pricing seemed a bit high compared to some of the alternatives.

The new reserved instance feature in EC2 is a great way to save money on EC2 instances that you know you will be running most of the time. Basically, you pay some money up front and then get a much cheaper per-hour charge on that instance. If you leave a server up and running 24x7x365 the savings can be substantial.

Buying a reserved instance is a little strange. Rather than using an explicit transaction where you supply your credit card, etc. AWS chose to create a new API call in the EC2 service that purchases the reserved instance. For code monkeys like me, that's fine but some boto users were asking for a little wrapper around the process that would make the selection easier and reduce the risk of buying the wrong reservation.

So, I created a little Python script that guides you through the process and gives you the opportunity to review what you are about to buy and bail out if you make a mistake. At each step in the script, the available choices are presented to you in a simple command line menu. Once you make your choice, the script moves on to the next selection, etc. until all of the information is gathered.

The script is called buyreservation.py and it lives in the ec2 package of the boto library. It's included in the newest 1.7a release. To use the script, just "cd" to the boto/boto/ec2 directory and follow the prompts. A transcript of a session is shown below:

jobs:ec2 mitch$ python buyreservation.py
[1] RegionInfo:eu-west-1
[2] RegionInfo:us-east-1
EC2 Region [1-2]: 2
[1] m1.small
[2] m1.large
[3] m1.xlarge
[4] c1.medium
[5] c1.xlarge
Instance Type [1-5]: 1
[1] Zone:us-east-1a
[2] Zone:us-east-1b
[3] Zone:us-east-1c
EC2 Availability Zone [1-3]: 3
Number of Instances: 1

The following Reserved Instances Offerings are available:

ID=248e7b75-0799-4a55-a0cb-f8d28eb11921
Instance Type=m1.small
Zone=us-east-1c
Duration=94608000
Fixed Price=500.0
Usage Price=0.03
Description=Linux/UNIX
ID=4b2293b4-1e6c-4eb3-ab74-4493c0e57987
Instance Type=m1.small
Zone=us-east-1c
Duration=31536000
Fixed Price=325.0
Usage Price=0.03
Description=Linux/UNIX
[1] ReservedInstanceOffering:248e7b75-0799-4a55-a0cb-f8d28eb11921
[2] ReservedInstanceOffering:4b2293b4-1e6c-4eb3-ab74-4493c0e57987
Offering [1-2]: 2

You have chosen this offering:
ID=4b2293b4-1e6c-4eb3-ab74-4493c0e57987
Instance Type=m1.small
Zone=us-east-1c
Duration=31536000
Fixed Price=325.0
Usage Price=0.03
Description=Linux/UNIX
!!! You are about to purchase 1 of these offerings for a total of $325.00 !!!
Are you sure you want to do this? If so, enter YES:

If, at that point, you enter "YES" boto will go ahead and submit the API request to purchase the reserved instance(s), otherwise it will bail out. Hopefully the script provides enough guidance, handholding and confirmation to take some of the fear out of the process so go out there and save some money!

Elastician

Tuesday, April 28, 2009

Cloud Lock-In. Not your father's lock-in.

Sunday, April 26, 2009

Buying EC2 Reserved Instances with Boto

Python and AWS

Followers