AMQPcat, a netcat-like tool for messaging fun

If you have read @ripienaar's excellent series of articles on common messaging patterns you probably noticed a handy CLI tool for working with STOMP queues called stompcat. I looked around for something similar for AMQP brokers but couldn't find anything quite the same. There is amqp-utils but I had some issues with these and the tools didn't work quite like I was hoping. So I wrote amqpcat with the idea of providing a similar tool to stompcat.

Available on github and rubygems.org: https://github.com/joemiller/amqpcat

Continue Reading

Sensu and Graphite

It's been pretty exciting to see the number of folks getting involved with Sensu lately, as judging by the increased activity on the #sensu channel on Freenode. One of the most common questions is how to integrate Sensu and Graphite. In this article I'll cover two approaches for pushing metrics from Sensu to Graphite.

Remember: think of Sensu as the "monitoring router". While we are going to show how to push metrics to Graphite, it is just as easy to push metrics to any other system – Librato, Cube, OpenTSDB, etc. In fact, it would not be difficult at all to push metrics to multiple graphing backends in a fanout manner.

Continue Reading

Re-use Nagios plugins in Sensu for quick profit

In my previous article I mentioned a key strength of Sensu is the ability to re-use existing Nagios plugins. This is a powerful feature of Sensu. Nagios has been around for at least 1000 years according to most recent archaeological discoveries, which means a vast amount of human effort (and capital) has gone into creating Nagios plugins. Being able to leverage this prior effort is a huge win. In this article I’ll demonstrate creating a Sensu check with the check_http Nagios plugin.

Continue Reading

Getting started with the Sensu monitoring framework

I’m excited about Sensu, a new open source monitoring framework, and I’d like to help others get started with it as well. So, after observing the frequent questions from new visitors to #sensu on Freenode I thought perhaps the best way to do that is to write a blog article to help folks get started. If you still have questions after reading this, feel free to come by #sensu on Freenode.

In this article I will provide a brief overview of Sensu with some background, walk through a client and server install, and then I will show you how to add a check and a handler. This should lay the groundwork for future articles with more examples on how to get the most value out of Sensu in your infrastructure.

Before we start, I owe a huge thanks to @jeremy_carroll for the many hours of work he put into building RPM’s for Sensu. His work on packaging will undoubtedly save many folks quite a bit of time.

What is Sensu?

Sensu is the creation of @portertech and his colleagues at sonian.com. They have graciously open-sourced the project and made it available to all of us searching for a modern monitoring platform (or anyone searching for an alternative to Nagios.)

Sensu is often described as the “monitoring router”. Put another way, Sensu connects the output from “check” scripts run across many nodes with “handler” scripts run on Sensu servers. Messages are passed via RabbitMQ. Checks are used, for example, to determine if Apache is up or down. Checks can also be used to collect metrics such as MySQL statistics. The output of checks is routed to one or more handlers. Handlers determine what to do with the results of checks. Handlers currently exist for sending alerts to Pagerduty, IRC, Twitter, etc. Handlers can also feed metrics into Graphite, Librato, etc. Writing checks and handlers is quite simple and can be done in any language.

Key details:

  • Ruby 1.8.7+ (EventMachine, Sinatra, AMQP), RabbitMQ, Redis
  • Excellent test coverage with continuous integration (travis-ci)
  • Messaging oriented architecture. Messages are JSON objects.
  • Ability to re-use existing Nagios plugins
  • Plugins and handlers (think notifications) can be written in any language
  • Supports sending metrics into various backends (Graphite, Librato, etc)
  • Designed with modern configuration management systems such as Chef or Puppet in mind
  • Designed for cloud environments
  • Lightweight, less than 1200 lines of code

Continue Reading

Correlating Puppet changes to events in your infrastructure using graphite

Sometimes it is pretty obvious when Puppet changes something in your infrastructure and bad things happen in a big dramatic way. Other times it’s not so obvious. It can be invaluable to be able to correlate changes made by Puppet to other events happening in your infrastructure.

For example, in this diagram we have plotted the load average from a group of servers. Blue vertical lines mark points in time when puppet modified a resource on a host in the group. We can see that immediately following a puppet change the load spiked on one of the servers.

Code available on github:

Leave the first comment

Staying DRY with Bash indirect references

I should start this post by saying I don’t recommend this method for all situations due to potential security issues, as well as some readability tradeoffs. If that didn’t scare you off, keep reading.

I was faced recently with a situation where I needed to update a MySQL clone script written in bash to pull data from multiple MySQL servers and write to a single server. We often do this to quickly copy data from environment to another for testing. The original script was written with the assumption of only one source server and looked roughly like this:

source_mysql_host="not.the.best.idea"
source_mysql_user="xxx"
source_mysql_passwd="yyy"
 
echo ; echo " ==>  Cloning databases. Source: $source_mysql_host, Destination: $dest_mysql_host"
 
DATABASES=`mysql -u${source_mysql_user} "-p${source_mysql_passwd}" -h${source_mysql_host} \
            -NBe 'show databases' | egrep -v '^information_schema|performance_schema|mysql|test|backups$'`
for db in $DATABASES; do
    echo "   ==> Cloning database '$db' .. "
 
    $mysqldump_cmd -u${source_mysql_user} "-p${source_mysql_passwd}" -h${source_mysql_host} --add-drop-database --database ${db} | \
    mysql -u${dest_mysql_user} "-p${dest_mysql_passwd}" -h${dest_mysql_host}
done

(The full block of code was 26 lines. Much of it has been left out since it is not relevant to this article)

In order to update this script to clone data from multiple servers we could simply duplicate the entire 26 line block of code and change the variables so that the first block uses $OLAP_mysql_host and the other uses $OLTP_mysql_host, and so on, but that would not be very DRY.

What I ended up doing was wrapping the block of code into new for loop and using bash’s indirect references to switch between the source MySQL servers. Note the ugly bash indirect references. It works and we stay DRY. Was it worth it? Maybe, Maybe not. =)

OLTP_source_mysql_host="not.the.best.idea"
OLTP_source_mysql_user="xxx"
OLTP_source_mysql_passwd="yyy"
 
OLAP_source_mysql_host="not.the.best.idea.2"
OLAP_source_mysql_user="xxx"
OLAP_source_mysql_passwd="yyy"
 
for type in "OLTP" "OLAP"; do
 
    # use bash's indirect references. very ugly, but helps us stay DRY
    _mysql_source_host=$(eval "echo \$$(echo ${type}_source_mysql_host)")
    _mysql_source_user=$(eval "echo \$$(echo ${type}_source_mysql_user)")
    _mysql_source_passwd=$(eval "echo \$$(echo ${type}_source_mysql_passwd)")
    _schema_only_tables=$(eval "echo \$$(echo ${type}_SCHEMA_ONLY_TABLES)")
 
    echo ; echo " ==> Cloning ${type} databases. Source: $_mysql_source_host, Destination: $dest_mysql_host"
 
    # get a list of databases, excluding the mysql system db's (ie: mysql, test, ...)
    DATABASES=`mysql -u${_mysql_source_user} "-p${_mysql_source_passwd}" -h${_mysql_source_host} \
                -NBe 'show databases' | egrep -v '^information_schema|performance_schema|mysql|test|backups$'`
    for db in $DATABASES; do
        echo "   ==> Cloning database '$db' .. "
 
        $mysqldump_cmd -u${_mysql_source_user} "-p${_mysql_source_passwd}" -h${_mysql_source_host} --add-drop-database --database ${db} | \
        mysql -u${dest_mysql_user} "-p${dest_mysql_passwd}" -h${dest_mysql_host}
    done
done
One comment so far, add another

List of statsd server implementations

Statsd is a simple client/server mechanism from the folks at Etsy that allows operations and development teams to easily feed a variety of metrics into a Graphite system. For more info on statsd read the seminal blog article on Statsd “Measure Anything, Measure Everything”.

As would be expected there are statsd clients in many languages. But, there are also many implementations of the statsd server. This is nice because each organization can pick the one that best fits them. For example, a python shop might prefer to deploy a python based statsd instead of Etsy’s original node.js implementation. Also, there are some statsd implementations that diverge from the original design and provide additional features.

I could not find a single resource that listed all of the different implementations, so I figured I would try to start one here.

  • Etsy’s statsd: node.js. The Original
  • petef-statsd: ruby. Supports AMQP.
  • statsd_rb: ruby.
  • quasor/statsd: ruby. can send data to graphite or mongoDB
  • py-statsd: python (including python client code).
  • zbx-statsd: python, based on py-statsd. Sends data to Zabbix instead of graphite.
  • statsd.scala: scala. Sends data to Ganglia instead of Graphite. Different messaging protocol, uses JSON.
  • txStatsD: python + twisted, from the folks @ Canonical
  • statsd-librato: node.js. Fork of etsy’s statsd for sending data to Librato instead of graphite from the folks @ Engine Yard.
  • estatsd: erlang. From the folks @ Opscode
  • statsite: python
  • metricsd: scala. Should be drop-in compatible with etsy’s statsd, but with support for additional metric types (eg: meter, gauge, histogram)
  • statsd-c: C. compatible with original etsy statsd
  • statsd (librato): node.js.  Librato’s officially maintained fork of statsd based on the changes from Engine Yard. Supports multiple graphing services including Librato Metrics
  • bucky: python. A unique spin on statsd that supports collecting data from statsd clients, collectd, and metricsd, with output to graphite. The ability to translate collectd plugin names to be more graphite-friendly is very compelling.

Please leave a comment if you have an implementation that should be listed here. Feedback on any of the above implementations would be helpful too.

5 comments so far, add yours

Network Link Conditioner in Xcode 4.1, Lion

Previously, I wrote a post about using the ‘dummynet’ functionality in Mac OSX’s ipfw(8) firewall to simulate a variety of networking conditions, such as:  bandwidth, packet loss, latency (delay). This is a great feature for testing software under a variety of network conditions but it can be a little tough to use unless you’re comfortable at the command line, or even better, have unix scripting skills since there are multiple commands required to create even simple scenarios.

Then, today I noticed that Apple now includes a new prefPane in Xcode 4.1 and Lion called “Network Link Conditioner” that simplifies all of this, and even includes a few profiles to get you started (eg: “Wifi, Average case”, “3G, Lossy Network”.) Pretty cool feature. Especially useful for iOS developers. Screenshot below.

  • Install: find and run /Developer/Applications/Utilities/Network Link Conditioner/Network Link Conditioner.prefPane

 

Leave the first comment

groovy-statsdclient

In an attempt to learn some Groovy and Gradle I wrote an implementation of a Statsd client in groovy.  It’s similar to other statsd clients in other languages and supports the typical increment(), decrement(), and timing() methods.

It’s not available on Maven Central or via Grape at this time, which will be a future learning exercise. In the meantime you can download a pre-built .jar or the source code from github:

For more background on Statsd, check out this blog article from Etsy:  Measure Anything, Measure Everything

 

Leave the first comment

#monitoringsucks – but it doesn’t have to

On May 26, 2011, John Vincent, aka @lusis, started a conversation with a simple tweet that led to an hour-plus chat on IRC freenode between some very smart folks.  The IRC channel was supposed to go away after the chat was over but instead it has survived and is attracting more and more attention.

The genesis of it all is the general staleness of the open source monitoring landscape.  There have been numerous and rapid improvements in operations tools in the last 10 years but when it comes to monitoring we’re still pretty much stuck on the same tools we’ve been using since the late-90s.  It would be understandable if these tools were “as good as it gets”, but they’re not, and there’s definitely room for improvement and sharing of ideas and experiences.  If nothing else, at least there’s a central place for people to discuss monitoring and share ideas.  There’s hardly a day that goes by where someone doesn’t stop into ##monitoringsucks on IRC and announce a new tool or concept they’re hacking away on.

Since the initial chat, a set of github.com repos has popped up to collect links to various tools and blog posts on the topic of monitoring:

  • irc-logs – IRC log of the initial chat that started it all on 5/26/2011 is available here
  • tools-repo – a great resource for many monitoring tools
  • blog-posts – a growing list of interesting monitoring related blog posts

If you’re interested in helping make the art of monitoring better, consider stopping by IRC Freenode or contributing to the github repos.  I’m hoping to learn quite a bit out of this little spark that John created.

Leave the first comment