Wednesday, December 16, 2009

Private and Streaming Distributions in CloudFront

Boto has supported the CloudFront content delivery service since it's initial launch in November of 2008. CloudFront has recently launched a couple of great new features:
  • Distributing Private Content
  • Streaming Distributions

While adding support for these features to boto, I also took the opportunity to (hopefully) improve the overall boto support for CloudFront. In this article, I'll take a quick tour of the new CloudFront features and in the process cover the improved support for CloudFront in boto.

First, a little refresher. The main abstraction in CloudFront is a Distribution and in CloudFront all Distributions are backed by an S3 bucket, referred to as the Origin. Until recently, all content distributed by CloudFront had to be public content because there was no mechanism to control access to the content.

To create a new Distribution for public content, let's assume that we already have an S3 bucket called my-origin that we want to use as the Origin:

>>> import boto
>>> c = boto.connect_cloudfront()
>>> d = c.create_distribution(origin='', enabled=True, caller_reference='My Distribution')
>>> d.domain_name

So, d now points to my new CloudFront Distribution, backed by my S3 bucket called my-origin. Boto makes it easy to add content objects to my new Distribution. For example, let's assume that I have a JPEG image on my local computer that I want to place in my new Distribution:

>>> fp = open('/home/mitch/mycoolimage.jpg')
>>> obj = d.add_object('mycoolimage.jpg', fp)

Not only does the add_object method copy the content to the correct S3 bucket, it also makes sure the S3 ACL is set correctly for the type of Distribution. In this case, since it is a public Distribution the content object will be publicly readable.

You can also list all objects currently in the Distribution (or rather it's underlying bucket) by calling the get_objects method and you can also get the CloudFront URL for any object by using it's url method:

>>> d.get_objects()
[] >>> obj.url() 

Don't Cross the Streams

The recently announced streaming feature of CloudFront will be of interest to anyone that needs to server audio or video. The nice thing about streaming is that only the content that the user actually watches or listens to is downloaded so if you have users with short attention spans, you can potentially save a lot of bandwidth costs. Plus, the streaming protocols support the ability to serve different quality media based on the user's available bandwidth.

To take advantage of these cool features, all you have to do is store streamable media files (e.g. FLV, MP3, MP4) in your origin bucket and then CloudFront will make those files available via RTMP, RTMPT, RTMPE or RTMPTE protocol using Adobe's Flash Media Server (see the CloudFront Developer's Guide for details).

The process for creating a new Streaming Distribution is almost identical to the above process.

>>> sd = c.create_streaming_distribution('', True, 'My Streaming Distribution')
>>> fp = open('/home/mitch/embarrassingvideo.flv')
>>> strmobj = sd.add_object('embarrassingvideo.flv', fp)
>>> strmobj.url()

Note that the url method still returns the correct URL to embed in your media player to access the streaming content.

My Own Private Idaho

Another new feature in CloudFront is the ability to distribute private content across the CloudFront content delivery network. This is really a two-part process:

  • Secure the content in S3 so only you and CloudFront have access to it
  • Create signed URL's pointing to the secure content that can be distributed to whoever you want to be able to access the content

I'm only going to cover the first part of the process here. The CloudFront Developer's Guide provides detailed instructions for creating the signed URL's. Eventually, I'd like to be able to create the signed URL's directly in boto but doing so requires some non-standard Python libraries to handle the RSA-SHA1 signing and that is something I try to avoid in boto.

Let's say that we want to take the public Distribution I created above and turn it into a private Distribution. The first thing we need to do is create an Origin Access Identity (OAI). The OAI is a kind of virtual AWS account. By granting the OAI (and only the OAI) read access to your private content it allows you to keep the content private but allow the CloudFront service to access it.

Let's create a new Origin Access Identity and associate it with our Distribution:

>>> oai = c.create_origin_access_identity('my_oai', 'An OAI for testing')
>>> d.update(origin_access_identity=oai)

If there is an Origin Access Identity associated with a Distribution then the add_object method will ensure that the ACL for any objects added to the distribution is set so that the OAI has READ access to the object. In addition, by default it will also configure the ACL so that all other grants are removed so only the owner and the OAI have access. You can override this behavior by passing replace=False to the add_object call.

Finally, boto makes it easy to add trusted signers to your private Distribution. A trusted signer is another AWS account that has been authorized to create signed URL's for your private Distribution. To enable another AWS account, you need that accounts AWS Account ID (see this for an explanation about the Account ID).

>>> from boto.cloudfront.signers import TrustedSigners
>>> ts = TrustedSigners()
>>> ts.append('084307701560')
>>> d.update(trusted_signers=ts)

As I said earlier, I'm not going to go into the process of actually creating the signed URL's in this blog post. The CloudFront docs do a good job of explaining this and until I come up with a way to support the signing process in boto, I don't really have anything to add.


  1. Cool stuff! Slightly off topic, how do you store Key/Secret Key info securely? Problem is, the login credintials need to be stored somewhere on the local machine running the python script, but don't want that info accessible. As I understand, you can't encrypt info in python, so how can we keep that sensitive data secure in case someone steals or breaks into the computer running the script? It's dangerous cuz they could potentially delete everything in the bucket!


  2. Thanks Mitch for the post. It was the first one that made it clear to me that I didn't have to first upload to S3 but that boto takes care of all of that for you.

    By the way, the way you use the API is now, as you know, outdated and I had to use a mix of reading the API docs and reading the source code.

    Keep up the great work!

  3. Could you expand on the "outdated" comment? I'm not sure I understand.


  4. Think he means this:
    >>> dist.update(origin_access_identity=ident)
    Traceback (most recent call last):
    File "", line 1, in
    TypeError: update() got an unexpected keyword argument 'origin_access_identity'