Search this site

[prev]  Page 3 of 54  [next]





Smart logging hacks in ruby

Ruby has Logger. It is good, but strings suck. In a world where more and more people are using log data for inputs and analysis, structured data is good. I want to log structured data.

This lead to me subclassing Logger and additionally providing my own logger format class. The code for this is in logstash, logging.rb.

What did I add? Two main goals: First, improve context. Second, log structured data (objects). This is achieved by style changes (log objects, not strings), adding awesome_print support, adding code context to each log (line/file/method), etc.

To support the first goal (context), if the loglevel is 'debug' I will inspect the call stack and include the file and line of code that is logging. I also set the 'progname' to the name of the program by default. To support the second goal, log objects and improve how objects are formatted into strings with Object#inspect (or awesome_inspect, if available).

Some examples:

>> logger =
I, [2010-11-12T15:19:48.388469 #18782]  INFO -- : ["Failed: require 'ap' (aka awesome_print); some logging features may be disabled", #<LoadError: no such file to load -- ap>]
# This is an example of what javascript folks call 'progressive enhancement'
# - we still function if awesome_print is not available.

>> logger.level = Logger::WARN
>> logger.warn("Hello")
W, [2010-11-12T15:20:05.465705 #18782]  WARN -- irb: Hello

>> logger.warn(["rejecting bad client", { :client => "" }])
W, [2010-11-12T15:21:04.639404 #18782]  WARN -- irb: ["rejecting bad client", {:client=>""}]

>> logger.level = Logger::DEBUG
>> logger.warn("Hello")
W, [2010-11-12T15:21:57.754874 #18782]  WARN -- (irb):12#irb_binding: Hello
# Notice the context (file, line, method)       ^^^^^^^^^^^^^^^^^^^^
# When DEBUG level is set only due to performance and verbosity concerns.
The main benefit personally is logging objects instead of strings, which you can do, today, with the standard Logger. However, standard logger doesn't make nice with awesome_print or add file/line/method context. Anyway, logging objects lets you later hook a smarter error handling tool up to your logging that can inspect the structured data rather than having to regex your way through a single string.

If you have awesome_print available, the object output by my formatter gets even more useful for human viewing:

Why log structured? Easier to parse and query later, like in a logstash query.

Puppet Trick - Exported Resource Expiration

I've finally taken the plunge with puppet's exported resources.

"Exported resources" is a feature of puppet that allows your nodes to export resources to other nodes. You can read more about this feature on puppet's exported resources documentation. Covering how to setup exported resources or storeconfigs is out of scope, but if you need help read the docs and come to #puppet on freenode IRC.

Exported resources are pretty cool, but they lack one important feature - expiration/purging. The storeconfigs database has no idea about nodes that you have decommissioned or repurposed, so it's very possible to leave exported resources orphaned in your database.

I worked around this by making my resources such that I can expire them. This is done by making a custom define that has a 'timestamp' field that defaults to now, when registering each time. If a node has not checked in (and updated its resources) recently, I will consider that resource expired and will purge it.

I made a demo of this and put the code on github: jordansissel/puppet-examples/exported-expiration. More details (and example output of multiple runs with expiration) are available in the README.

The demo is runnable by itself (standalone, no puppet master), so you can test it without needing to mess with your own puppet installations.

new grok version available (1.20101030)

Another grok release is available. Major changes include:
  • Pattern discovery as described here.
  • Doxygen (C) and RDoc (Ruby) docs now available.
  • Much improved ruby support (gem install jls-grok).

Hop on over to the grok project page and download the new version.

Changes since last announced release:

  - Add 'make package-debian' to produce a .deb build of grok.

  - API docs via doxygen
  - rdoc for the Ruby module
  - Add Grok::Pile to Ruby

  - Fix tests
  - Add a ruby example for pattern discovery
  - Add grok-web example (runs grok in ruby via sinatra to show pattern discovery)
  - Add more time formats (US, EU, ISO8601)
  - Fix bug that prevented multiple patterns with the same complexity from being
    used in discovery.

  - Add pattern discovery through grok_discover (C) and Grok#discover (Ruby)
    Idea for this feature documented here:
  - The ruby gem is now called 'jls-grok' since someone already had the 'grok'
    gem name on gemcutter.
  - Fix some pattern errors found in the test suite.
  - New version numbering to match my other tools.

Partials in HAML with Sinatra

HAML documentation mentions "partials" twice, in passing only. Doesn't actually say how to use them, best as I can tell. Lots of the documentation seems to be heavily Rails-focused (mentions of link_to and other methods). Frankly, if I didn't already have lots of Rails experience, I think some of the haml docs would be pretty confusing.

In Sinatra, you invoke haml by simply calling 'haml' with your template name.

  get "/" do
    haml :index
On that note, Sinatra (or HAML, I haven't checked) will let you use layouts, too, something I found hardly documented. Layouts are a way to basically wrap your entire template in another template called 'layout.haml' by default - useful if you want every page to have the same structure. You can also define different layouts.

What I wanted was a simple layout with a header in a separate template. Here's what I did:


    %title Hello!
      =haml :header, :layout => false
Thie 'yield' will invoke the original template called as 'haml :index', helping keep each page using the same layout. Additionally, I render the header by calling a separate haml template; calling haml with ':layout => false' tells it to ignore any layout files. This lets us invoke another haml template.

Also, since the 'haml' method only takes symbols, if you want to put your haml templates in different directories, you'll have to use this awkward syntax:

  haml :"some/path/to/template"
    #  ^ Note the ':' before the string. You can also probably the
    #  String#to_sym method.

New keynav release available (0.20101014)

The keynav project gets a upgrade today. Some small feature additions and bug fixes.

Download: keynav-0.20101014.3067.tar.gz

Changes from the previous release:

  - Added 'restart' command. Makes keynav restart. Useful for binding a key to
    reload the config.
  - Added 'loadconfig' command. This lets you include additional config files
    to load on the command line or in one of the default keynavrc files.
    (requested by Axel Beckert)
  - keynav will now restart if it receives SIGHUP or SIGUSR1
  - Map 'Enter' by default to 'warp,click 1,end' (requested by Axel Beckert)
    by Axel Beckert)
  - Fix a bug causing the point under the mouse cursor to not click through the
    keynav window in certain conditions. Reported via mailing list by Eric Van
    Dewoestine and Krister Svanlund.

xdotool 2.20101012 release

New xdotool for your enjoyment. Lots of features and bug fixes. Huge thanks to folks who sent in bug reports, patches, or ideas. This version should be ABI compatible to the previous release. All tests are passing (270 tests, 2838 assertions).


Major changes:

  • New commands: behave_screen_edge, windowminimize, exec
  • Add static build ('make xdotool.static')
  • Add 'restore' feature to 'mousemove'.
  • Improvements to xdotool 'script' behavior. Patch from Henning Bekel.

Cool tool: With the new screen edge behavior, added examples/ This takes a selected window (by click) and hides it until the mouse hits the left-side of the screen. The window will hide again when the mouse leaves the window; the window will be full-height and slide in from the left. When hiding it, it will slide left, off of the screen. If you run this and want to restore the window, just ctrl+c the script, rerun it, and hit ctrl+c again before clicking on a new window.

As usual, if you find problems or have feature requests, please file bugs or send an email to the list.

Changelist since previous announcement:

  - New command: 'windowminimize' for minimizing windows.
  - New command: 'behave_screen_edge' to attach commands to edge and corner activities.
    For example, to lock your screen when the mouse hits the top-right corner
    for more than 2 seconds:
      xdotool behave_screen_edge --delay 2000 top-right exec xscreensaver-command -lock
    'xscreensaver-command' comes with the xscreensaver package.
  - New command: 'exec' for executing commands. Useful with behave_screen_edge.
  - Bug fix: Catch pointer grab failures in selectwindow (xdo_window_select)
    and abort if we can't grab it and report an error message.
  - Add  --delay and --repeat flags to click. This allows you to easily
    double-click (or more) in a single command. Requested by meenakshi sundaram
    via the mailing list.
  - Add new option to mouse move 'restore'. This will restore the mouse position
    to the last known position prior to the most recent move. That is:
      xdotool mousemove 0 0 mousemove restore
    the above will move the mouse to 0,0 then move the mouse back to the
    position it was before. Requested by luc le rumeur and suffian amjad on the
    mailing list.
  - Add 'static' and 'install-static' targets. Requested by Anthony
    Thyssen on the mailing list.
  - Fix bug in xdotool getmouselocation that didn't search for the client window
  - New function: xdo_mousewindow, which lets you get the window the mouse is
    currently over.
  - New function: xdo_mouselocation2, which is an extended version of
    xdo_mouselocation and includes the window the mouse is currently over.
  - New function: xdo_window_minimize, which minimizes a window.
  - Add patch from Henning Bekel that greatly improves xdotool's script
    feature. Scripts now no longer invoke /bin/sh for each line, but instead
    build a command chain to execute. Further, positional arguments ($1, $2,
    etc) and environment variables ($FOO, $WINDOWID, etc) work as expected
    despite the removal of /bin/sh.

Puppet Camp San Francisco 2010

Another puppet camp has come and gone, and I'm certainly glad I went. Puppet, the surrounding ecosystem, and its community has grown quickly since last year.

The conference was the same format as last year. The morning was single-track presentations from various puppet users, and the afternoon was openspace/barcamp-style break out sessions. It was good to see some old faces and also to finally put faces to IRC and twitter names.

One of the bigger announcements was that mcollective would join the Puppet project. Other announcements included new employees and other good news. Beyond that, I picked up a few tricks and learned more about the puppet roadmap.

In no particular order - some thoughts and notes.

Facter 2.0 will be good. Take lessons learned from Facter 1.x and improve things - Make the DSL for making facts simpler, add structured data, add caching, etc.

Puppet supports a "config_version" option that specifies a script that will override how the version of a given catalog is determined. Useful for tagging based on revision control or deployment versions.

Scoped defaults such as 'File { owner => root }' apply downwards in all cases, something I hadn't considered before. That is, if you are class 'foo' and define a default and also include class 'bar', the default in foo will apply to bar as well. This was new to me, and I will be cleaning up some of my manifests as a result (I use defaults in some classes but not others). Best practice here is to either use no class-specific defaults or use class-specific defaults in every class.

Twitter operations (John Adams) gave a talk covering their automation/puppet stuff. John talked about problems with sysadmins trying to hack around puppet by using chattr +i to prevent puppet from modifying certain files - a practice they heavily discouraged. He also mentioned problems with poor cron scheduling and presented the usual sleep $(($RANDOM % 600))-style solution. I didn't get around to sharing my cron practices (sysadvent) with John before the end of the con. He also mentioned having problems with home directory syncing using puppet, which was another solution I'd covered that here and better solved previously on sysadvent.

During some downtime at the conference, I started working on an ssh key authorization module for mcollective. The ruby ssh key code is here and the mcollective fork with the sshkey security plugin is here. It works pretty well:

snack(~) % sudo grep security /etc/mcollective/{server,client}.cfg
/etc/mcollective/server.cfg:securityprovider = sshkey
/etc/mcollective/client.cfg:securityprovider = sshkey
snack(~) % mc-ping                                         
snack.home                               time=97.81 ms
The gist of the key signing pieces is that your ssh agent signing authenticates you as a user for requests, and for responses the server signs messages with its own ssh host key (like /etc/ssh/ssh_host_rsa_key). Validation of you as a user is done through your authorized_keys file, and validation for the reply uses your known_hosts file to verify the host signature.

It was a good conference, though I would've enjoyed a more hackathon-style atmosphere. We tried to do a facter hackathon, but there wasn't enough time, so instead we code reviewed some of the sillier parts of facter and talked about the future.

GNOME keyring daemon vs ssh-agent

A fellow sysadmin friend of mine has, from time to time, bitched about performance problems with gnome's ssh agent called "keyring." I don't use gnome, so I don't have his problems.

Yesterday, I wrote some ruby code that'll let you use your own ssh keys (and agent) to sign and verify arbitrary strings. See jordansissel/ruby-sshkeyauth on github.

Putting the above module to work, I can easily compare signing speeds of agents under various load conditions. The main complaint was that keyring falls apart under parallel signing demands - a very common situation for any sysadmin that sshes to more than one system at a time say, with a script, or capistrano, or another similar tool.

Under single-client signing requests (one at a time), gnome-keyring on ubuntu 10.04 signed 1000 "Hello world" strings in 12.76 seconds; ssh-agent cut that time by more than half achieving the same result in 5.05 seconds.

Under reasonable parallel load, ssh-agent's lead grew even further. On 4 cores, 5 'signing' processes, and 1000 signing requests each, the results are gnome-keyring signing all 1000 requests on each of 5 processes in parallel in 113.23 seconds, while ssh-agent signed the same in 30.61 seconds.

If you want to repeat my tests, you can use the 'samples/speed.rb' script from the above ruby-sshkeyauth project on github. Single-thread tests were done using ruby speed.rb "Hello world" while the 5-process test was done using seq 5 | xargs -P5 -n1 ruby speed.rb "Hello world".

So, if you're a regular user of ssh and ssh agents, you may want to stick with ssh-agent rather than gnome-keyring-daemon given the huge performance lead ssh-agent has.

Ubuntu's cron package silently ignores files

I was trying to debug why my cronjobs weren't running in ubuntu. It should've been simple, right? Put a file in /etc/cron.d, and cron goes to town.

Except on Ubuntu (and probably Debian). Ubuntu and Debian have a horrible habit of patching upstream sources, excessively, to the point where they are no longer quite the upstream package. It's practices like this that caused the great Debian SSL/SSH key vulnerability and cause projects like Ion and Firefox (I think?) to have license clauses that prevent downstream packagers from calling megapatched version by the same name as the upstream project.

Vixie cron is not protected from the megapatching practice.

I did the normal debugging. You know, strace, etc. I saw cron stat my cron file, which was reduced to "* * * * * root logger HELLO WORLD" just to try anything. Nothing.

So, I download the source package with "apt-get source cron" and see the following:

% gzip -dc cron_3.0pl1-106ubuntu5.diff.gz | wc -l            
7228 lines of patches. How big is cron?
nightfall(~/cron-3.0pl1) % wc -l *.c |tail -1
  4869 total
There are more lines of patches than there are lines of code. Pretty sure the README file says "Vixie Cron," but this kind of megapatching kind of takes the "Vixie" out of it.

The offending portion of the patch looks like this:

+static int valid_name (char *filename);


+               /* skipfile names with letters outside the set
+                * [A-Za-z0-9_-], like run-parts.
+                */
+               if (!valid_name(dp->d_name))
+                 continue;

Now I know why my cron jobs were being ignored: I typically name my cron files as 'something.cron' because it makes them more identifable in puppet and in svn.

I'm not sure what silly policy felt the need to mandate filenames in /etc/cron.d, but it needs to go. If it stays, it needs to actually be documented much more clearly. And no, I don't mean documenting it in some obscure manpage (like 'run-parts') that nobody reads or documenting it confusingly by saying "If using LSB, then this naming policy applies, if not using LSB, then this other naming policy applies"; I mean documenting it by having cron log "Ignoring <filename> due to <some policy>" - that's right, if you can't tell from the patch above, ubuntu's cron will silently ignore some files.

Explicit is better than implicit. Documentation doesn't always sync with code. Code is truth.

Further confusing the issue is the inconsistency in documenting this change. The crontab(1) manpage has a "DEBIAN SPECIFIC" section detailing this kind of thing (though I imagine it is quite incomplete). Update: There is no such section ('debian specific') in cron(8), though as commenters point out, this naming policy is documented in cron(8) but not clearly, in my opinion.

Sigh, what a waste of my time... Speaking of unexpected changes to decades-old system tools - FreeBSD knows what's up.

Finally, you'd think that out of all the 7228 lines of patches, one of them would fix this trivial bug still listed in crontab(1):

       Although cron requires that each entry in a crontab end  in  a  newline  character,
       neither  the  crontab  command nor the cron daemon will detect this error. Instead,
       the crontab will appear to load normally. However, the command will never run.  The
       best choice is to ensure that your crontab has a blank line at the end.
Nope. I guess megapatching isn't for fixing bugs but instead for applying major changes and random, poorly documented and poorly thought-out policy changes to upstream code without forking the projects.

RAID is not Redundant.

My year at Rocket Fuel has seen many unique system failures. One specific kind of failure I want to highlight is those of full RAID failures. I've talked before about how RAID is not a backup technology.

Tonight, we rebooted a machine that hung (presumably due to OOM or other funkiness) and it came back in the bios saying:

Foreign configuration(s) found on adapter
Our managed hosting support weren't sure what to make of this, so we decided to make a new home (from backups) for the services on this now-dead machine. Dell won't helping debug on this until tomorrow.

This is one of many total data losses I've observed on RAID sets in recent months - all due to RAID failures. Thankfully, We have backups that get shipped to HDFS. We monitor those backups. We also have puppet and other automation to help move and rebuild services on a new host. We're equipped to handle this kind of failure.

This leads me to a new conclusion: The 'R' in RAID is a lie. It is not redundant. Treating it that way can lead you to the raid-is-backup fallacy.

Wikipedia has this to say about Redundancy (engineering): "In engineering, redundancy is the duplication of critical components of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe."

Adding more parts (complexity) to a system doesn't often increase its reliability. Even taking into account the disk redundancy you might get with mirror or parity, you're still hedging that the RAID card doesn't die, which it will. Everything will fail (see MTBF) eventually, and MTTR on RAID failures is usually quite high, so weigh your risk.

Back to my conclusion that RAID is not redundant. RAID is not dead, I'm just done viewing RAID as a continuity-through-drive-failure technology. RAID has other benefits, though. It achieves more than just redundancy (when your card doesn't die).

RAID makes multiple drives present as a single drive device to the OS, right? Right. RAID allows you to aggregate disk IO performance to achive higher read/write rates than with a single disk alone. You can also aggregate disk space this way, too, if you didn't know.

It's almost 0100 now, I'd much rather be sleeping or playing TF2 than helping rebuild from backups.