Where: Loggly, Inc.
When: 08/2010-Current---
Where: Rocket Fuel, Inc.
When: 07/2009-09/2010Multirole position mainly as lead systems administration and tools specialist, but also filled in as a generalist in areas such as build engineering, software engineering, architect, etc. I was primary point for managing systems and infrastructure and managed technical operations with our hosting providers. Also carried a pager for 24/7 on-call for production systems with pages coming from nagios; pager load was low.
Built Rocket Fuel's production infrastructure automation using tools such as puppet, hudson, ruby on rails, etc. Puppet and a small ruby on rails app are used to drive configuration management. Hudson is used for builds, tests, packaging, and deployments. These deployments were scheduled and run by engineers which greatly increased agility in testing, staging QA, and production deployments as ops was not a blocking gateway. Deployments did rolling upgrades with multiple sanity checks to ensure only one server was down at a time. Software and system configuration deployments were very common - multiple times per day across multiple projects up to 90 deployments in a single month. New systems took only a few minutes, unattended, to go from freshly imaged to active in production with a puppet-driven configuration. Additionally, backups of databases, logs, and other production state were copied to our internal HDFS for long-term storage. This HDFS cluster was managed by puppet just like any other service. I created a staging environment managed with the same production tools. Puppet data is stored in subversion and deployed the same way as production applications. Automated config generation was used for many services based on server attributes such as site, cluster, hardware configuration, etc. Such services included nagios, mysql servers (master and slave), munin, quagga/ospfd, openvpn, apache, haproxy, dns, sudo, and more.
Wrote a display advertisement testing tool using headless X servers, firefox, WebDriver and Hadoop to test functionality of thousands of advertisements (flash and non-flash) verifying features such as click tracking and landing page pixels. This required extending WebDriver (both java and firefox extension) to include tracking network activity (FireBug-style) and querying browser history.
To help with various business needs and to improve application monitoring, I wrote various web automation testing for business analytics in addition to systems and application monitoring using WebDriver (Java) and mechanize (Ruby) for systems behind login portals.
Wrote a firefox extension for our ad operations team to provide insight into ad activity (impression, click, and other tracking). This tool removed the requirement that ad operations folks use firebug and other tools to inspect and verify advertisment activity.
Where: OnLive, Inc.
When: 04/2008-07/2009Production systems administrator/engineer on a small team. Shared responsibility for all areas of production network and systems administration, including design, implementation, and maintenance of production environments. My primary duties required design and implementation of practices, tools, and automation for a production system needing to scale to tens of thousands of hosts. Assisted and worked closely with IT, engineering, QA, and other groups in solving inter-group technology problems. Participated in and helped build on-call rotation.
Automation with config management (svn, puppet, etc) and truth database to automate the system deployment and maintenance of both Linux (CentOS 5) and Windows systems. Built a simple windows automation framework with powershell after failing to find a similar tool to perform the same tasks. Aided in operational and software archictecture design and policy creation. Designed and implemented tools to increase developer agility in deployment and debugging in a production environment. Drove effort to educate and ensure operational best practices were followed in all stages of software development. Participated in on-call rotation. Designed and lead implementation of a cross-platform service monitoring system to allow quick black-box health and stats monitoring of all business-critical production services. Assisted with production security risk management through secure network and software design.
Project and technical lead of production windows system automation and deployment. Designed and built a system to automate configuration and maintenance of production Windows XP and Vista hosts; such maintenance includes device driver revisions, user profile configuration, automated application installation, and other system configuration. The automation included the full machine lifecycle including system imaging. This automation system increased reliability, repeatability, and performance of system maintenance and (re)installation and additionally effected a major boost in confidence of correctness in our windows systems which helped improve all stages of the product from development to QA to deployment.
Designed and maintained engineer and QA tools to help in accessing and debugging production and staging systems without requiring users to have deep knowledge of production networking environments. Primary feature included automatic ssh tunneling to support remote debugging and inspection using tools like VNC, JMX, remote gdb, etc. To further hide complexity, this tool would upgrade itself if there was a newer version released. To ease use, it was primarily activated through a special url handler registered with Windows so the launch mechanism would be HTTP links. It was written in powershell and vbscript.
Where: Google, Inc
When: 06/2006-04/2008Cluster and scalable systems administrator for systems with multiple clients and usage patterns. Network and software troubleshooting and monitoring. Automation with shelli and python for myself and coworkers and via web interfaces for customers. Responsible for maintaining critical backend systems which power many Google products. Primary duties revolved around continued maintenance, monitoring, analysis, and improvement of multitenant clusters up to sizes of tens of thousands and many datacenters. Helped design capacity planning tools and documents to guide new bigtable users on performance and capacity planning. Worked with bigtable engineers on improvements such as monitoring, multitenancy, and backups.
Built a web-based self-service web application to help internal customers fulfill some requests instead of filing tickets. Was the "go to" guy on my team regarding information for many pieces of Google infrastructure. Mentored several members of my team, including those senior to me. Developed several tools to aid in troubleshooting and debugging. Prototyped tools and presented many ideas for improving the service my group provided.
Skills used: Python, network diagnostics, shell scripting, perl, scaling services, on-call support. Clustered filesystems and databases (Bigtable, GFS, etc).
Where: Computer Science Department at Rochester Institute of Technology
When: 10/2005-05/2006Computer Science Department at Rochester Institute of Technology Configuration and deployment of Solaris-based servers with open source and commercial software. Troubleshoot software bugs and inconsistencies. Assist with faculty and student technical support. Skills used: Solaris, shell scripting, perl.
Where: College of Engineering, Rochester Institute of Technology
When: 06/2005-09/2005Administrative support for parallel computation research on a 24-node Fedora/ Linux cluster. Instructed co-workers on use of revision control, use of community documentation, development with MPI, and working with Unix environments. Developed faster algorithms for existing software applications to meet research project needs. Skills used: Linux, C, MPICH, Subversion, software debugging.
Where: ITS at Rochester Institute of Technology
When: 09/2004-03/2005Development and maintenance of user and network management systems. Improved existing account management systems for campus-wide deployment of LDAP, Active Directory, DCE, and VMS services. Debugged and improved network management system integrating dhcp and dns with computer registration. Skills used: C, PHP, m4, make, cvs, Oracle, Pro*C
Where: Computer Science Department at Rochester Institute of Technology
When: 07/2004-08/2004Design and deployment of GNOME-based Solaris desktop environment. Focused on security and ease of use. Supported faculty and students on Solaris 9 technical issues.
Where: OnlySecure.com (Livingston, MT)
When: 12/2002-09/2005Developed an online store sales reporting and automation system from the ground up with an evolutionary development model. Provided service maintenance and support for mission critical systems across multiple systems. Skills used: Perl, C, PHP, mySQL
(Technical writing)I'd followed the Perl Advent Calendar for a few years, and in late 2008 I decided that there needed to be a similar project with sysadmin topics; one article for each day of December up to Christmas. I gathered a few fellow sysadmin bloggers together to work on ideas and articles. The first year, I wrote 23 of the 25 articles. This project encompasses some of my best writing. URL: http://sysadvent.blogspot.com/
(Languages: Perl, C++, C, and Ruby. Regex: Perl, Boost Xpressive, and PCRE. Open Source)Having grown tired of reading authentication and system logs looking for problems, I decided to make the process easier. If an event is predictable, then we can make a script handle it. This tool is a very configurable Expert-like system that allows you to define predicted patterns in log or program output aswell as reactions to those patterns. It will also process text and generate patterns which match that text. This tool was later rewritten in C++/Boost, and finally ended up in C using libpcre and has Ruby bindings.
(C, X11, automation. Open Source.)Wrote an Xlib automation tool to aid in scriptability of the X11 environment. Supports faking mouse and keyboard input and window management features. Has lots of users and is available in most unix package systems (ubuntu, fedora, freebsd, etc). New releases focus on stronger testing and responding to community feedback.
(C, libpcap, libnet, networking. Open Source.)In an effort to be able to play Halo 2 with some out of state friends, I wrote an xbox system link proxy that would essentially bridge only xbox network (broadcast) traffic across across layer 3 networks using UDP. Written in C and uses libpcap and libnet. A later update added multicast support so Apple's Rendezvous (mdns) protocol could span networks segments.
(JavaScript, XUL, CSS. Open Source)To date I have written a few firefox extensions. One, to allow you to search your open tabs for keywords and another to allow you to trivially edit query parameters in the url bar. These extensions required the use of XUL (mozilla's toolkit language) and JavaScript.