Tuesday, February 22, 2011

Accessing the Internet Archive with boto

A recent tweet from Pete Skomoroch twigged me to the fact that the Internet Archive provides an S3-like API.   Cool!  The Internet Archive is a great resource which provides, in their words:
...a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public.
Since boto supports S3 I wondered if it would be possible to access the Internet Archive's API with boto.  Turns out, it's quite simple.  To make it even simpler, I've added a new "connect_ia" method.  Before you can use this, you need to get API credentials from the Internet Archive but fortunately that's really easy.  Just sign up for an account (if you don't already have one) and then go to this link to generate the API keys.

Once you have your credentials, the easiest thing to do is to add the credentials to your boto config file.  They need to go in the Credentials section like this:



Then, you can create a connection to the Internet Archive like this:



I've only tested this a bit so if you run into any problems with it, post a message to boto-users or create an issue.

2 comments:

  1. Hi, boto it's great...
    Now I'm working in a facebook aplication, and I need to implement amazon payments... maybe can you helpme please?? I don't know how to do it.... Can you give some simple code example or a good link? I will be very grateful...

    ReplyDelete
  2. Searching the internet archives with Boto is easy. No wonder you came up with this, being once a part of DocuShare's team.

    ReplyDelete