Alerting with Loggly and Pagerduty
I recently wrote a blog post about triggering Woot lights on my desk with some simple Python code and Loggly. While hooking up Loggly to an Arduino with Woot lights can be somewhat interesting and exciting, I really didn’t present a practical solution for monitoring and alerting on events such as exceptions or errors generated by your applications.
In this example I’ll do just that with about 7 lines of Python code and a call to the mighty fine PagerDuty service.
The Setup
This solution uses the Python Hoover library which is available via the cheese shop. You’ll need to install Hoover with the following commands, which may or may not include you to installing
setuptools.
[php] sudo apt-get install python-setuptools sudo easy_install hoover [/php]
Once you have Hoover installed, you’ll need to figure out what you want to alert on. For this example I’m going to use my lame blog which is hosted on code Bret Taylor wrote for Google AppEngine and Tornado. I use my async logging library for logging out of AppEngine to Loggly.
I put a method in my blog which throws an error when you hit this page. This is the result of the exception when viewed in the Loggly shell:
The Code
Now we see the error, and know what to search on to find it, we need write some simple code that will run a single bucket facet search to return a numeric count of results over a given period of time and trigger a PagerDuty alert if we find anything. In this example I constrain my search to NOW -6MINUTES to NOW -1MINUTE to ensure the events have been forwarded and indexed by Loggly. Here’s the code:
[php] import httplib2, simplejson, hoover from hoover import utils hoover.authorize('geekceo', 'kordless', 'password') geekceo = hoover.utils.get_input_by_name('geekceo_http') num_results = geekceo.facets(q='thegeekceo AND exception', starttime='NOW-6MINUTES', endtime='NOW-1MINUTE', buckets=1)['data'].items()[0][1] if num_results > 0: url = "https://events.pagerduty.com/generic/2010-04-15/create_event.json" data = "{'service_key': '111ba18038aa012e4faa12311d009e57', 'incident_key': 'failftw', 'event_type': 'trigger', 'description': 'loggly reporting failftw on thegeekceo'}" duty = httplib2.Http() response, content = duty.request(url, "POST", data) [/php]
Ok, so I lied. It’s 10 lines of Python. It’s also one line in your crontab file:
*/5 * * * * /usr/bin/python blogalert.py
In practice you’d want to take the date stamp of the bucket result and use it for PD’s incident key. This would keep any overlap in searches from triggering a double or false alert.
BTW, on a related note my wife keeps calling PagerDuty ‘the girlfriend’, because she texts and calls me at all hours of the night and I have to scamper off to acknowledge her advances. My suggestion to Alex the other day was for PD to implement sexy personas that I could pick that would whisper sweet nothings in my ear at 2AM such as, “Hey baby, web head 12 is down again. Would you like to resolve?” 😛
Happy alerting!
The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.
Hoover J. Beaver