Tuesday, February 22, 2011

Accessing the Internet Archive with boto

A recent tweet from Pete Skomoroch twigged me to the fact that the Internet Archive provides an S3-like API.   Cool!  The Internet Archive is a great resource which provides, in their words:
...a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public.
Since boto supports S3 I wondered if it would be possible to access the Internet Archive's API with boto.  Turns out, it's quite simple.  To make it even simpler, I've added a new "connect_ia" method.  Before you can use this, you need to get API credentials from the Internet Archive but fortunately that's really easy.  Just sign up for an account (if you don't already have one) and then go to this link to generate the API keys.

Once you have your credentials, the easiest thing to do is to add the credentials to your boto config file.  They need to go in the Credentials section like this:



Then, you can create a connection to the Internet Archive like this:



I've only tested this a bit so if you run into any problems with it, post a message to boto-users or create an issue.

Thursday, February 17, 2011

All Your Website Are Belong to S3

One of the most commonly requested features for S3 has been the ability to have it act more like a web server.  In other words, to be able to put an index.html file into a bucket and then point someone to that bucket and they see your website.  I found requests for this on the S3 forum dating back to June 2006.  I'm pretty sure if you search around in the forums long enough you will see posts from me predicting S3 would never have this feature.

Well, as is so often the case, I have been proven wrong.  AWS has just announced a new feature of S3 that lets you easily host static websites entirely on S3.  The features are pretty simple to use.  The basic process is:

  • Create a bucket to hold your website (or use an existing one)
  • Make sure the bucket is readable by the world
  • Upload your website content including the default page (usually index.html) and a optional page to display in case of errors
  • Configure your bucket for use as a website (using a new API call)
  • Access your website via the new hostname S3 provides for website viewing.  You can also create CNAME aliases, etc. to map the bucket name to your own domain name
The following Python code provides an example of all of the above steps.


I could now access my website using the following special link:

http://garnaat-website-2.s3-website-us-west-1.amazonaws.com/

I could also use the CNAME aliasing features of S3 to map my S3 website to my own domain (which is probably what most people will want to do).  It's a great new feature for S3 and something that should prove useful to a lot of people.