Thursday, May 21, 2009

Using EC2 CloudWatch in Boto

The new CloudWatch service from AWS provides some interesting ways to monitor EC2 instances and LoadBalancers. The code to support this new service has just been checked into the subversion repository for boto. It still needs some hardening before it will be incorporated into a new boto release but if you are interested in experimenting with CloudWatch, check out the latest boto code and let me know what you think. This post should provide just about enough to get you started.

The 5 Minute How-To Guide

First, make sure you have something to monitor. You can either create a LoadBalancer or enable monitoring on an existing EC2 instance. To enable monitoring on an existing instance, you can do something like this:

>>> import boto
>>> c = boto.connect_ec2()
>>> c.monitor_instance('i-12345678')


Where the "i-12345678" is the ID of your existing instance. It takes a while for the monitoring data to start accumulating but once it does, you can do this:

>>> import boto
>>> c = boto.connect_cloudwatch()
>>> metrics = c.list_metrics()
>>> metrics
[Metric:NetworkIn,
Metric:NetworkOut,
Metric:NetworkOut(InstanceType,m1.small),
Metric:NetworkIn(InstanceId,i-e573e68c),
Metric:CPUUtilization(InstanceId,i-e573e68c),
Metric:DiskWriteBytes(InstanceType,m1.small),
Metric:DiskWriteBytes(ImageId,ami-a1ffb63),
Metric:NetworkOut(ImageId,ami-a1ffb63),
Metric:DiskWriteOps(InstanceType,m1.small),
Metric:DiskReadBytes(InstanceType,m1.small),
Metric:DiskReadOps(ImageId,ami-a1ffb63),
Metric:CPUUtilization(InstanceType,m1.small),
Metric:NetworkIn(ImageId,ami-a1ffb63),
Metric:DiskReadOps(InstanceType,m1.small),
Metric:DiskReadBytes,
Metric:CPUUtilization,
Metric:DiskWriteBytes(InstanceId,i-e573e68c),
Metric:DiskWriteOps(InstanceId,i-e573e68c),
Metric:DiskWriteOps,
Metric:DiskReadOps,
Metric:CPUUtilization(ImageId,ami-a1ffb63),
Metric:DiskReadOps(InstanceId,i-e573e68c),
Metric:NetworkOut(InstanceId,i-e573e68c),
Metric:DiskReadBytes(ImageId,ami-a1ffb63),
Metric:DiskReadBytes(InstanceId,i-e573e68c),
Metric:DiskWriteBytes,
Metric:NetworkIn(InstanceType,m1.small),
Metric:DiskWriteOps(ImageId,ami-a1ffb63)]

The list_metrics call will return a list of all of the available metrics that you can query against. Each entry in the list is a Metric object. As you can see from the list above, some of the metrics are generic metrics and some have Dimensions associated with them (e.g. InstanceType=m1.small). The Dimension can be used to refine your query. So, for example, I could query the metric Metric:CPUUtilization which would create the desired statistic by aggregating cpu utilization data across all sources of information available or I could refine that by querying the metric Metric:CPUUtilization(InstanceId,i-e573e68c) which would use only the data associated with the instance identified by the instance ID i-e573e68c.

Because for this example, I'm only monitoring a single instance, the set of metrics available to me are fairly limited. If I was monitoring many instances, using many different instance types and AMI's and also several load balancers, the list of available metrics would grow considerably.

Once you have the list of available metrics, you can actually query the CloudWatch system for the data associated with that metric. Let's choose the CPU utilization metric for our instance.

>>> m = metrics[5]
>>> m
Metric:CPUUtilization(InstanceId,i-e573e68c)

The Metric object has a query method that lets us actually perform the query against the collected data in CloudWatch. To call that, we need a start time and end time to control the time span of data that we are interested in. For this example, let's say we want the
data for the previous hour:

>>> import datetime
>>> end = datetime.datetime.now()
>>> start = end - datetime.timedelta(hours=1)

We also need to supply the Statistic that we want reported and the Units to use for the results. The Statistic must be one of these values:

['Minimum', 'Maximum', 'Sum', 'Average', 'Samples']

And Units must be one of the following:

['Seconds', 'Percent', 'Bytes', 'Bits', 'Count', 'Bytes/Second', 'Bits/Second', 'Count/Second']

The query method also takes an optional parameter, period. This parameter controls the granularity (in seconds) of the data returned. The smallest period is 60 seconds and the value must be a multiple of 60 seconds. So, let's ask for the average as a percent:

>>> datapoints = m.query(start, end, 'Average', 'Percent')
>>> len(datapoints)
60

Our period was 60 seconds and our duration was one hour so we should get 60 data points back and we can see that we did. Each element in the datapoints list is a DataPoint object which is a simple subclass of a Python dict object. Each Datapoint object contains all of the information available about that particular data point.

>>> d = datapoints[0]
>>> d
{u'Average': 0.0,
u'Samples': 1.0,
u'Timestamp': u'2009-05-21T19:55:00Z',
u'Unit': u'Percent'}

My server obviously isn't very busy right now!

That gives you a quick look at the CloudWatch service and how to access the service in boto. These features are still under development and feedback is welcome so give it a try and let me know what you think.

12 comments:

  1. no module name CloudWatch. I tried to install the boto trunk over the original boto I had installed. Would this not work?

    ReplyDelete
  2. It might work, depending on how you did it. You are grabbing the latest code from subversion, right? Can you include the actual error you are getting?

    ReplyDelete
  3. I cannot seem to be able to paste it. I originally installed the boto without CloudWatch. Then pulled form the trunk. I believe I am getting a conflict from that. Any idea how I can solve.

    ReplyDelete
  4. I figured it out. ./setup.py install does not work. If you look at the egg file, it does not include cloudwatch.

    ReplyDelete
  5. Thanks for the article. One question that comes to mind for me is, how to I get metrics by instance id.

    ReplyDelete
  6. Mitch, Thanks for the Write up. I do still have one question however.

    Is it possible to set the Dimension with getting the original list of metrics.

    I am just trying to make a simplified call to get metrics by the instance id.

    ReplyDelete
  7. The call to ListMetrics doesn't accept any filters to you will always get back the full list of available metrics. Perhaps I could add some filtering to the list of Metrics returned that would make it easier to find only the metrics you are interested in. I'll try that out and add it in if it makes sense.

    Thanks for the comments.

    ReplyDelete
  8. I guess the question is , how complicated would it be to add that functionality.

    This way the request could look something like:

    metrics = c.list_metrics("Instanceid, i-e574bncc")

    I would love to chat with you more about why I want to do this. Are you on IRC?

    ReplyDelete
  9. @rinex, another option for monitoring a single instance (or a single autoscaling group) is Scout's CloudWatch graphs, trends and alerts plugin.

    ReplyDelete
  10. how do you specify the region in which the instance is in. by default it takes us-east-1

    ReplyDelete
  11. There are a couple of ways. Probably the easiest is this:

    import boto.ec2.cloudwatch

    c = boto.ec2.cloudwatch.connect_to_region('eu-west-1')

    You can supply any region string. You can also do:

    regions = boto.ec2.cloudwatch.regions()

    and then pick a region out of the list of RegionInfo objects returned and call the connect method of that object to get a connection object.

    ReplyDelete
  12. Sorry, this is not the best of examples
    (Using EC2 CloudWatch in Boto).

    -> m = metrics[5]
    You want to select by name and id

    Cloudwatch uses UTC.

    ReplyDelete