December 25, 2013

Day 25 - Go at a glance

Written by: Kelsey Hightower (@kelseyhightower)
Edited by: Ben Cotton (@funnelfiasco)

2013 was a fantastic year for Go and system tools written in the language. Derek Collison, Founder & CEO of Apcera Inc, predicted in September 2012 that “Go will become the dominant language for systems work in IaaS, Orchestration, and PaaS in 24 months”. The release of the following projects, written in Go, should serve as proof that we are on track to see that happen:
  • Docker (Pack, ship and run any application as a lightweight container)
  • Packer (Automates the creation of any type of machine image)
  • Etcd (Highly-available key value store for shared configuration and service discovery)
  • SkyDNS (DNS service for service discovery)
  • Heka (Data collection and processing made easy)
  • Groupcache (Memcache replacement for caching and cache-filling)
I take this as a strong signal that the time to learn Go is now. The language is stable, has a large and active community, and Go is getting better with every release. The Go feature list is extensive, but when it comes to system administration the following items stand out for me:
  • Simplicity
  • Large Standard Library
  • Statically Compiled Binaries
  • Concurrency


Go only has 25 key words. There are no classes, exceptions, or inheritance, which keeps the language small and approachable. Go doesn’t force a particular paradigm. You can get a lot done in Go writing structured code, yet still end-up with an elegant solution.
Lets review a short example to get a feel for the language:
// main.go
package main

import (

func main() {
    log.Print("I'm using Go")

Every Go program starts with a main package.

package main

Next we import the log package from the standard library:

import (

Finally we define the main entry point of the program. In the body of main we call the Print() function from the log package, which we use to log "I'm using Go" to the system console:

func main() {
    log.Print("I'm using Go")

Unlike Ruby or Python, Go is a compiled language. We must compile our code before we can run it. You’ll find it refreshing to know Go makes the compilation process easy and fast. So fast that we can compile and run our code in a single step:

go run main.go
2013/12/22 19:38:40 I'm using Go

This is great during development, but when you’re ready for deployment you’ll want to create a static binary:

go build -o main main.go
2013/12/22 19:38:40 I'm using Go

You can get a deeper dive into the language by taking A Tour of Go.

Large Standard Library

Go ships with a large standard library containing just about everything you need for system administration out of the box:
  • command line flag processing
  • interacting with the OS -- executing shell commands, reading and writing files, etc
  • working with various encodings including JSON, CSV, XML, and base64
Go also ships with a very performant HTTP server implementation for serving static files and building REST services.


Often times the tools we build start out small. When it comes time to scale for performance we often need to turn to 3rd party frameworks or libraries. With Go you don’t have that problem. Concurrency is built-in, and not only that, it’s easy to use.

Check out Rob Pike’s, one of the original authors of Go, excellent talk on Go concurrency patterns.

Statically-compiled Binaries

My favorite Go feature is the static binary. Running the go build command produces a self-contained binary which includes all package dependencies and the Go runtime. This means I don’t have to install Go on every target system or worry about conflicting dependencies at deploy time. This is by far the largest time saver I’ve come to appreciate when working with Go.

There is, however, a catch. Since Go compiles to machine code you need to build binaries for each platform you want your code to run on. Linux binaries won't work on Windows. But just like everything else in Go, the process of building static binaries for other platforms is pretty straight forward. Go ships with all the bits necessary for cross-compiling your code, but you'll have to setup the proper toolchain first. Bootstrapping the environment for cross-compiling can be painful especially for people new to the process, but there are a few tools to help you automate the entire process:

goxc (build tool focused on cross-compiling and packaging)
gox (simple, no frills Go cross compile tool)

Once the toolchain is in place you can simply override the GOARCH and GOOS environment variables to cross-compile your code. For example, if you wanted to target Windows running on the 64 bit architecture run the following command:

GOARCH="amd64" GOOS="windows" go build

For more details about building static binaries for other platforms checkout Dave Cheney’s Introduction to cross compilation with Go.


I encourage you to consider Go for your next System Administration task, and if you do you’ll quickly realize the attention Go has been getting lately is more than hype. Go delivers a modern platform with classic language features that makes it easy to get things done. In addition to all the things you’d expect from a programming language the collection of unique features such as static binaries, fast compilation, and built-in concurrency are sure to change the way you approach everyday problems.

December 24, 2013

Day 24 - Ars Longa, Vita Brevis

Written By: Mohit Chawla (@a1cy)
Edited By: Aleksey Tsalolikhin (@atsaloli)

A common, recurring argument and observation about the field of operations is that its quite young, compared to other engineering disciplines. The nature of learning, in becoming a good operations engineer, benefits from, and requires a multi- and inter-disciplinarian approach - the career, the culture, the actual technical practices and runbooks, and the tools that we design, build and use every day - all of these borrow ideas from other, more established and better understood industries and professions, in order to progress and improve understanding of our own endeavours in the field.

One fundamental area of study in operations, is the human-computer interface. From the design of our systems' peripherals, to the automation, monitoring technologies we use and develop, and the postmortems we carry out of failures and disasters, a comprehensive understanding and acknowledgement of the intricacies of this interaction is central, as emphasized effectively in the talks and writings of many respectable members from the community.

Another area of research at these crossroads but not directly related to system administration, are the digital humanities, that make use of, but are not solely characterized by, powerful tools for information retrieval, text analysis, visualization and statistics to enhance existing understanding, and gain new insights in humanities and social sciences.

To complement existing ongoing projects and efforts aiming to formalize, develop and solidify knowledge in operations, such as Ops School, SABOK, and other scattered forms of dissemination such as articles, podcasts, weeklies, books, talks and mailing lists, it could be beneficial to use some of the techniques employed by digital humanists, as an additional aid, an extra lens to view our field from.

Particularly, Topic Modeling is an indispensable technique used in digital humanities.

'Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts.'

The above definition is taken from the page of one of the pioneers of the field, David M. Blei, who's also one of the original developers of the Latent Dirichlet Allocation algorithm - which is the dominant topic modeling algorithm used in digital humanities.

By doing a comparative analyses of data corpuses from other engineering disciplines, and the existing literature and understanding of system administration, we can possibly develop newer ways of inference.

An oversimplified version of doing this process could be stated in the following steps:
1) Selection and gathering of literature from different engineering and scientific disciplines.
2) Gathering literature about system administration.
3) Application of LDA to these collections, followed by comparative analysis between the disciplines.

All the three tasks pose various practical problems - procurement of literature limited by economical and personal resources, non overlapping or conflicting rosters of vocabularies and terminologies in these disciplines, the relative lack of formally published literature in system administration, and the tuning of the algorithms themselves. Perhaps by the next installment of sysadvent, I'll have ironed out some of these problems. Meanwhile, if other members of the community are interested in the idea, do get in touch.

1) John Allspaw's Blog
2) Jordan Sissel's entry from SysAdvent 2011 on Learning From Other Industries
3) Slides by Lindsay Holmwood on Alert Design
4) Accessible introduction to Topic Modeling
5) David M. Blei's Topic Modeling page
6) gensim, a python library for topic modeling
7) Mark Burgess's amazon page 8) SABOK
9) Ops School

December 23, 2013

Day 23 - The profession of operations and why it's the coolest thing ever

Written by: John Vincent (@lusis)
Edited by: Ben Cotton (@funnelfiasco)

I own a lot of t-shirts. I'm addicted to them. I have the one in this picture and it's one of my favorite ones. It's a favorite because it is the most accurate description of what a sysadmin does: everything.
This is why operations is one of the coolest careers on the planet.

The IT organization

IT organizations range anywhere from flatter-than-kansas to complex hierarchies where each physical cable on the network has its own org chart. Regardless of how they're structured, there's usually one person who has the most comprehensive view of every moving piece and has most likely performed every single role at some point and time. That person is probably the sysadmin.

It's worth noting that every role in IT is important. This isn't an attempt to denigrate those who are in those roles. Quite the contrary. One of the best things about being a sysadmin is that you get to do them all!


One thing that has always fascinated me about systems administration and operations in general is that it is largely a self-taught field. Even today there are, according to Wikipedia, very few actual sysadmin degrees. It's largely an uncodified discipline (though initiatives like Ops School are trying to change that), and yet it is the core and foundation of a company's IT. You can take courses, get degrees and be certified on everything that a sysadmin does and yet there's no official "sysadmin" certification (that is worth anything, anyway).

There are many theories for this but the one I tend to stick with is that to do your job you simply have to understand every single aspect. In fact, it's expected of you.

The sound of one hand clapping

If a server isn't connected to a network is it still a server? Right out of the gate, you're dealing with networking. But what good is a server if it's not serving. Let's throw some database software on there.

Now you've got performance issues. Looks like there are I/O problems. Now you're a storage admin figuring out iSCSI issues (which also happens to be networking). Now you're looking at what your database software is doing to treat your poor disks that way. You look at tablespaces and hit ratios. You're running explains and tuning bufferpool sizes. When did you become a DBA?
Almost everyone I've ever talked to has come to a specialization in IT by way of being a sysadmin (though that could be biased based on my typical peers). The rare exception is development which has always been a more formal discipline.

Called to service

One of the things that has always attracted me to being a sysadmin outside of getting to touch pretty much every aspect of technology is that of service. You run servers. Servers "serve". You serve others.

You keep the lights on. This is a powerful ideal. Permit me a bit of hyperbole if you will: When your servers go down, your company isn't making money. When companies don't make money, people could lose jobs or worse everyone could lose their job.

Even if your primary industry isn't technology, having backend systems like financials offline can lead to lost revenue, delayed payroll and the like. Even if you outsource all of that and have minimal infrastructure, there's still a sysadmin somewhere in the mix keeping things running. It's systems administrators all the way down.

And let's not forget the users OUTSIDE of your company as well. Think about how people have used tools like Twitter to organize the wholesale replacement of corrupt governments. People have used Facebook to help find loved ones after natural disasters.

To the team of system administrators who kept the Neil Gaiman Yahoo! group running 11 years ago - thank you. Without that I would have never met my wife.

Even if you aren't working in operations at some major internet company, there's a person in your company who didn't have to fight technology for a day and it made them happy.

There's nothing wrong with being a sysadmin

Many people think of being a sysadmin as just a stepping stone. A way in the door. You do your time working the pager and maybe at some point you move to another team. There's nothing wrong with that.

But there's nothing wrong with being a systems administrator. Not only is it one of the most important parts of the bigger picture but quite frankly it's one of the most fun.

You should be PROUD of being in operations. Be proud of being a sysadmin. You don't need a new title.

Trust me. I get it. People can be assholes.
"Oh you're a sysadmin? We'll I'M a devops!".
Avoid those people. They're very unhappy people anyway.

Not a sysadmin?

That's okay. You're important too.

The fact is, we're all important. The people who write the software we use. The people who keep the pipes fast. The people who keep the storage going. We're all in this together and having fun. We cross-pollinate. We share. We learn. We thrive.

Sysadmins are still the best though. =)

December 22, 2013

Day 22 - Getting Started Testing Your Puppet Modules

Written By: Christopher Webber (@cwebber)

So, you read the great article by Paul Czarkowski (@pczarkowski) on December 11th, The Lazy SysAdmin's Guide to Test Driven Chef Cookbooks, but felt left out because you run Puppet... Well this article is for you.

Getting the Environment Setup

The first hurdle to get over is the version of Ruby and where it is coming from. My personal preference is to use something like RVM or rbenv. But the only real thing I recommend is making sure you are using a version of ruby that is the same as what you have in production. I have been bitten by things that work in 1.8.7, the version of Ruby that Puppet uses in production in my environment, but didn't work in 1.9.3, the version I was testing with on my workstation.

The Gems

You will need/want the following gems:

  • puppet (Once again, use the version you are running in production)
  • rspec-puppet
  • puppet-lint
  • puppetlabs_spec_helper

Once you have installed the gems, you should be ready to move on to the next step.

Generating A New Module

For this we are going to use the handy puppet module command to generate our starting module. We will be working with a module called cwebber-example.

$ puppet module generate cwebber-example
Notice: Generating module at /Users/cwebber/cwebber-example

Getting some testing boilerplate in place

Before we start building the boilerplate, it is useful to create the templates. Additionally, I think the boilerplate spec_helper.rb that rspec-puppet creates is a better starting point, so I run rm -rf spec to allow the rspec-puppet files to be used. To get the boilerplate in place run:

$ rspec-puppet-init
 + spec/
 + spec/classes/
 + spec/defines/
 + spec/functions/
 + spec/hosts/
 + spec/fixtures/
 + spec/fixtures/manifests/
 + spec/fixtures/modules/
 + spec/fixtures/modules/example/
 + spec/fixtures/manifests/site.pp
 + spec/fixtures/modules/example/manifests
 + spec/fixtures/modules/example/templates
 + spec/spec_helper.rb
 + Rakefile

Fixing the rake tasks

Finally, I make a slight change to the Rakefile so that some a few additional tasks are supported and to enable a few advanced features we will touch on a little later. Update your Rakefile to look like the following:

require 'rake'

require 'rspec/core/rake_task'
require 'puppetlabs_spec_helper/rake_tasks'

Writing the first test

Given the popularity of TDD, I am going to show the example of writing the test first. While this makes sense for this example, please don't feel bad if you find yourself writing the code first and writing tests to cover that code later.

To keep things simple, we are going to use puppet to create a file called /foo with the contents bar. While this is a little arbitrary, we can get a feel for what it might look like to test a config file. We are going to put this file resource in the init.pp for the module we created.

Creating the test involves adding the following contents to the file spec/classes/example_spec.rb.

require 'spec_helper'

describe "example" do

it do
  should contain_file('/foo').with({
    'ensure'  => 'present',
    'content' => %r{^bar}

Running this test should fail as the resource has not been added to the class yet. To run the test, we run rake spec to kick it off.

$ rake spec
/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color


  1) example should contain File[/foo] with ensure => "present" and content matching /^bar/
     Failure/Error: })
       expected that the catalogue would contain File[/foo]
     # ./spec/classes/example_spec.rb:9

Finished in 0.08232 seconds
1 example, 1 failure

Failed examples:

rspec ./spec/classes/example_spec.rb:5 # example should contain File[/foo] with ensure => "present" and content matching /^bar/
/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color failed

Now that there is a failing test, time to create the puppet code to go with it.

The corresponding puppet code

The puppet code that goes with this test is pretty straight forward. In manifests/init.pp we add the following inside the example class.

file {'/foo':
  ensure  => present,
  content => 'bar'

Now that we have code in place, lets test to make sure it passes our test by running rake spec again.

$ rake spec
/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color

Finished in 0.07837 seconds
1 example, 0 failures

Useful things to know about


One of the things I never cared for was the lack of information when you had passing tests. If you add --format documentation to ~/.rspec you get output that looks like this:

/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color

  should contain File[/foo] with ensure => "present" and content matching /^bar/

Finished in 0.09379 seconds
1 example, 0 failures

rake lint

One of the additional tasks that gets added when we change the Rakefile is the lint task. This task runs puppet-lint across all of the puppet files, verifying that they pass a certain coding standard. To get more info, please visit


As soon as you start to get into more complex modules, you see a dependency on other modules. The .fixtures.yml file is there to help make sure that the appropriate puppet modules are checked out for testing use. See for more details.

December 21, 2013

Day 21 - Making the web secure, one unit test at a time

Written By: Gareth Rushgrove (@garethr)
Edited By: Michael Stahnke (@stahnma)

Writing automated tests for your code is one of those things that, once you have gotten into it, you never want to see code without tests ever again. Why write pages and pages of documentation about how something should work when you can write tests to show exactly how something does work? Looking at the number and quality of testing tools and frameworks (like cucumber, rspec, Test Kitchen, Server Spec, Beaker, Casper and Jasmine to name a few) that have popped up in the last year or so I'm obviously not the only person who has a thing for testing utilities.

One of the other things I am interested in is web application security, so this post is all about using the tools and techniques from unit testing to avoid common web application security issues. I'm using Ruby in the examples but you could quickly convert these to other languages if you desire.

Any port in a storm

Lets start out with something simple. Accidentally exposing applications on TCP ports can lead to data loss or introduce a vector for attack. Maybe your main website is super secure, but you left the port for your database open to the internet. It's the server configuration equivalent of forgetting to lock the back door.

Nmap is a tool lots of people will be familiar with for spanning for open ports. As well as a command line interface Nmap also has good library support in lots of languages so lets try and write a simple tests suite around it.

require "tempfile"
require "nmap/program"
require "nmap/xml"

describe "the website" do
  file ="nmap.xml")
  before(:all) do
    Nmap::Program.scan do |nmap|
      nmap.xml = file.path
      nmap.targets = ""

  @open_ports = []"scan.xml") do |xml|
    xml.each_host do |host|
      host.each_port do |port|
        @open_ports << port.number if port.state == :open

With the above code in place we can then write tests like:

it "should have two ports open" do
  @open_ports.should have(2).items

it "should have port 80 open" do
  @open_ports.should include(80)

it "should have port 22 closed" do
  @open_ports.should_not include(22)

We can run these manually, but also potentially as part of a continuous integration build or constantly as part of a monitoring suite.

Run the Guantlt

We had to do quite a bit of work wrapping Nmap before we could write the tests above. Wouldn't it be nice if someone had already wrapped lots of useful security minded tools for us? Gauntlt is pretty much just that, it's a security testing framework based on cucumber which currently supports curl, nmap, sslyze, sqlmap, garmr and a bunch more tools in master. Lets do something more advanced than our port scanning test above by testing a URL for a SQL injection vulnerability.

Feature: Run sqlmap against a target
  Scenario: Identify SQL injection vulnerabilities
    Given "sqlmap" is installed
    And the following profile:
      | name       | value                                      |
      | target_url | http://localhost/sql-injection?number_id=1 |
    When I launch a "sqlmap" attack with:
      python <sqlmap_path> -u <target_url> —dbms sqlite —batch -v 0 —tables
    Then the output should contain:
      sqlmap identified the following injection points
    And the output should contain:
      [2 tables]
      | numbers         |
      | sqlite_sequence |

The Gauntlt team publish lots of examples like this one alongside the source code, so getting started is easy. Gauntlt is very powerful, but as you'll see from the example above you need to know quite a bit about the underlying tools it is using. In the case above you need to know the various arguments to sqlmap and also how to interpret the output.

Enter Prodder

Prodder is a tool I put together to automate a few specific types of security testing. In many ways it's very similar to Gauntlt; it uses the cucumber testing framework and uses some of the same tools (like nmap and sslyze) under the hood. However rather than a general purpose security framework like Gauntlt, Prodder is higher level and very opinionated. Here's an example:

Feature: SSL
  In order to ensure secure connections
  I want to check the SSL configuration of my servers
    Given "" is installed
    Scenario: Check SSLv2 is disabled
      When we test using the "sslv2" protocol
      Then the exit status should be 0
      And the output should contain "SSLv2 disabled"

    Scenario: Check certificate is trusted
      When we check the certificate
      Then the output should contain "Certificate is Trusted"
      And the output should match /OK — (Common|Subject Alternative) Name Matches/
      And the output should not contain "Signature Algorithm: md5"
      And the output should not contain "Signature Algorithm: md2"
      And the output should contain "Key Size: 2048"

    Scenario: Check certificate renegotiations
      When we test certificate renegotiation
      Then the output should contain "Client-initiated Renegotiations: Rejected"
      And the output should contain "Secure Renegotiation: Supported"

    Scenario: Check SSLv3 is not using weak ciphers
      When we test using the "sslv3" protocol
      Then the output should not contain "Anon"
      And the output should not contain "96bits"
      And the output should not contain "40bits"
      And the output should not contain " 0bits"

This is a little higher level than the Gauntlt example — it's not exposing the workings of sslyze that is doing the actual testing. All you need is an understanding of SSL certifcates. Even if you're not an expert on SSL you can accept the aforementioned opinions of Prodder about what good looks like. Prodder currently contains steps and exampes for port scanning, SSL certificates and security minded HTTP headers. If you already have a cucumber based test suite (including one based on Gauntlt) you can reuse the step definitions in that too.

I'm hoping to build upon Prodder, adding more types of tests and getting agreement on the included opinions from the wider systems administration community. By having a default set of shared assertions about the expected security of out system we can more easily move onto new projects, safe in the knowledge that a test will fail if someone messes up our once secure configuration.

I'm convinced, what should I do next?

As well as trying out some of the above tools and techniques for yourself I'd recommend encouraging more security conversations in your development and operations teams. Here's a few places to start with:

December 20, 2013

Day 20 - Distributed configuration data with etcd

Written by: Kelsey Hightower (@kelseyhightower)
Edited by: Ben Cotton (@funnelfiasco)


I’ve been managing applications for a long time, but I’ve never stopped to ponder why most application configurations are managed via files. Just about every application deployed today requires a configuration file stored in the correct location, with the proper permissions, and valid content on every host that runs the application.

If not, things break.

Sure configuration management tools provide everything you need to automate the process of constructing and syncing these files, but the whole process is starting to feel a bit outdated. Why are we still writing applications from scratch that rely on external tools (and even worse, people) to manage configuration files?

Think about that for a moment.

The state of application configuration seems a bit stagnant, especially when compared to the innovation happening in the world of application deployment. Thanks to virtualization we have the ability to deploy applications in minutes, and with advances in containerization, we get the same results in seconds.

However, all those application instances need to be configured. Is there a better way of doing this, or are we stuck with configuration files as the primary solution?

Introducing etcd

What is etcd? Straight from the docs:
A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on:
  • Simple: curl'able user facing API (HTTP+JSON)
  • Secure: optional SSL client cert authentication
  • Fast: benchmarked 1000s of writes/s per instance
  • Reliable: Robustly distributed using Raft
On the surface it appears that etcd could be swapped out with any key/value store, but if you did that you would be missing out on some key features such as:
  • Notification on key changes
  • TTLs on keys
But why would anyone choose to move configuration data from files to something like etcd? Well for the same reasons DNS moved away from zone files to a distributed database: speed and portability.


When using etcd all consumers have immediate access to configuration data. etcd makes it easy for applications to watch for changes, which reduces the time between a configuration change and propagation of that change throughout the infrastructure. In contrast, syncing files around takes time and in many cases you need to know the location of the consumer before files can be pushed. This becomes a pain point when you bring autoscaling into the picture.


Using a remote database of any kind can make data more portable. This holds true for configuration data and etcd -- access to configuration data stored in etcd is the same regardless of OS, device, or application platform in use.

Hands on with etcd

Lets run through a few quick examples to get a feel for how etcd works, then we’ll move on to a real world use case.

Adding values

curl -X PUT -L -d value=""

Retrieving values

curl -L


Deleting values

curl -L -XDELETE

That’s all there is to it. No need for a database library or specialized client, we can utilize all of etcd’s features using curl.

A real world use case

To really appreciate the full power of etcd we need to look at a real world example. I’ve put together an example weather-app that caches weather data in a redis database, which just so happens to utilize etcd for configuration.

First we need to populate etcd with the configuration data required by the weather app:

We can do this using curl:
curl -XPUT -L -d value="Portland"
curl -XPUT -L -d value="5"
curl -XPUT -L \
  -d value=""
curl -XPUT -L \
  -d value=""

Next we need to set the etcd host used by the weather app:

For this example I’m using an environment variable to bootstrap things. The prefered method would be to use a DNS service record instead, so we can avoid relying on local settings.

Now with our configuration data in place, and the etcd host set, we are ready to start the weather-app:
2013/12/18 21:17:36 weather app starting ...
2013/12/18 21:17:37 Setting current temp for Portland: 30.92
2013/12/18 21:17:42 Setting current temp for Portland: 30.92

Things seem to be working. From the output above I can tell I’m hitting the right URL to grab the current weather for the city of Portland every 5 seconds. If I check my the redis database, I see that the temperature is being cached:
redis> get Portland

Nothing too exciting there. But watch what happens if I change the value of the /weather_app/city key in etcd:
curl -XPUT -L -d value="Atlanta"

We end up with:
2013/12/18 21:17:36 weather app starting ...
2013/12/18 21:17:37 Setting current temp for Portland: 30.92
2013/12/18 21:17:42 Setting current temp for Portland: 30.92
2013/12/18 21:17:48 Setting current temp for Portland: 30.92
2013/12/18 21:17:53 Setting current temp for Atlanta: 29.84
2013/12/18 21:17:59 Setting current temp for Atlanta: 29.84

Notice how we are now tracking the current temperature for Atlanta instead of Portland. The results are cached in Redis just as expected:
redis> get Atlanta

etcd makes it really easy to update and watch for configuration changes; then apply the results at run-time. While this might seem a bit overkill for a single app instance, it’s incredibly useful when running large clusters or when autoscaling comes into the picture.

Everything we’ve done so far was pretty basic. We used curl to set some configuration, then had our application use those settings. But we can push this idea even further. There is no reason to limit our applications to read-only operations. We can also write configuration data to etcd directly. This unlocks a whole new world of possibilities. Web applications can expose their IP addresses and ports for use by load-balancers. Databases could expose connection details to entire clusters. This would mean making changes to existing applications, and perhaps more importantly would mean changing how we design new applications. But maybe it’s time for AppOps -- lets get out of the way and let the applications configure themselves.


Hopefully this post has highlighted how etcd can go beyond traditional configuration methods by exposing configuration data directly to applications. Does this mean we can get rid of configuration files? Nope. Today the file system provides a standard interface that works just about anywhere. However, it should be clear that files are not the only option for modern application configuration, and viable alternatives do exist.

December 19, 2013

Day 19 - Automating IAM Credentials with Ruby and Chef

Written by: Joshua Timberman (@jtimberman)
Edited by: Shaun Mouton (@sdmouton)

Chef, nee Opscode, has long used Amazon Web Services. In fact, the original iteration of "Hosted Enterprise Chef," "The Opscode Platform," was deployed entirely in EC2. In the time since, AWS has introduced many excellent features and libraries to work with them, including Identity and Access Management (IAM), and the AWS SDK. Especially relevant to our interests is the Ruby SDK, which is available as the aws-sdk RubyGem. Additionally, the operations team at Nordstrom has released a gem for managing encrypted data bags called chef-vault. In this post, I will describe how we use the AWS IAM feature, how we automate it with the aws-sdk gem, and store secrets securely using chef-vault.


First, here are a few definitions and references for readers.
  • Hosted Enterprise Chef - Enterprise Chef as a hosted service.
  • AWS IAM - management system for authentication/authorization to Amazon Web Services resources such as EC2, S3, and others.
  • AWS SDK for Ruby - RubyGem providing Ruby classes for AWS services.
  • Encrypted Data Bags - Feature of Chef Server and Enterprise Chef that allows users to encrypt data content with a shared secret.
  • Chef Vault - RubyGem to encrypt data bags using public keys of nodes on a chef server.

How We Use AWS and IAM

We have used AWS for a long time, before the IAM feature existed. Originally with The Opscode Platform, we used EC2 to run all the instances. While we have moved our production systems to a dedicated hosting environment, we do have non-production services in EC2. We also have some external monitoring systems in EC2. Hosted Enterprise Chef uses S3 to store cookbook content. Those with an account can see this with knife cookbook show COOKBOOK VERSION, and note the URL for the files. We also use S3 for storing the packages from our omnibus build tool. The omnitruck metadata API service exposes this.

All these AWS resources - EC2 instances, S3 buckets - are distributed across a few different AWS accounts. Before IAM, there was no way to have data segregation because the account credentials were shared across the entire account. For (hopefully obvious) security reasons, we need to have the customer content separate from our non-production EC2 instances. Similarly, we need to have the metadata about the omnibus packages separate from the packages themselves. In order to manage all these different accounts and their credentials which need to be automatically distributed to systems that need them, we use IAM users, encrypted data bags, and chef.

Unfortunately, using various accounts adds complexity in managing all this, but through the tooling I'm about to describe, it is a lot easier to manage now than it was in the past. We use a fairly simple data file format of JSON data, and a Ruby script that uses the AWS SDK RubyGem. I'll describe the parts of the JSON file, and then the script.

IAM Permissions

IAM allows customers to create separate groups which are containers of users to have permissions to different AWS resources. Customers can manage these through the AWS console, or through the API. The API uses JSON documents to manage the policy statement of permissions the user has to AWS resources. Here's an example:
  "Statement": [
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
Granted to an IAM user, this will allow that user to perform all S3 actions to the bucket an-s3-bucket and all the files it contains. Without the /*, only operations against the bucket itself would be allowed. To set read-only permissions, use only the List and Get actions:
"Action": [
Since this is JSON data, we can easily parse and manipulate this through the API. I'll cover that shortly.

See the IAM policy documentation for more information.

Chef Vault

We use data bags to store secret credentials we want to configure through Chef recipes. In order to protect these secrets further, we encrypt the data bags, using chef-vault. As I have previously written about chef-vault in general, this section will describe what we're interested in from our automation perspective.

Chef vault itself is concerned with three things:
  1. The content to encrypt.
  2. The nodes that should have access (via a search query).
  3. The administrators (human users) who should have access.
"Access" means that those entities are allowed to decrypt the encrypted content. In the case of our IAM users, this is the AWS access key ID and the AWS secret access key, which will be the content to encrypt. The nodes will come from a search query to the Chef Server, which will be added as a field in the JSON document that will be used in a later section. Finally, the administrators will simply be the list of users from the Chef Server.

Data File Format

The script reads a JSON file, described here:
  "accounts": [
  "user": "secret-files",
  "group": "secret-files",
  "policy": {
    "Statement": [
        "Action": "s3:*",
        "Effect": "Allow",
        "Resource": [
  "search_query": "role:secret-files-server"
This is an example of the JSON we use. The fields:
  • accounts: an array of AWS account names that have authentication credentials configured in ~/.aws/config - see my post about managing multiple AWS accounts
  • user: the IAM user to create.
  • group: the IAM group for the created user. We use a 1:1 user:group mapping.
  • policy: the IAM policy of permissions, with the action, the effect, and the AWS resources. See the IAM documentation for more information about this.
  • search_query: the Chef search query to perform to get the nodes that should have access to the resources. For example, this one will allow all nodes that have the Chef role secret-files-server in their expanded run list.
These JSON files can go anywhere, the script will take the file path as an argument.

Create IAM Script

Note This script is cleaned up to save space and get to the meat of it. I'm planning to make it into a knife plugin but haven't gotten a round tuit yet.
require 'inifile'
require 'aws-sdk'
require 'json'
filename = ARGV[0]
dirname  = File.dirname(filename)
aws_data = JSON.parse(
aws_data['accounts'].each do |account|
  aws_creds = {}
  aws_access_keys = {}
  # load the aws config for the specified account
  IniFile.load("#{ENV['HOME']}/.aws/config")[account].map{|k,v| aws_creds[k.gsub(/aws_/,'')]=v}
  iam =
  # Create the group
  group = iam.groups.create(aws_data['group'])
  # Load policy from the JSON file
  policy = AWS::IAM::Policy.from_json(aws_data['policy'].to_json)
  group.policies[aws_data['group']] = policy
  # Create the user
  user = iam.users.create(aws_data['user'])
  # Add the user to the group
  # Create the access keys
  access_keys = user.access_keys.create
  aws_access_keys['aws_access_key_id'] = access_keys.credentials.fetch(:access_key_id)
  aws_access_keys['aws_secret_access_key'] = access_keys.credentials.fetch(:secret_access_key)
  # Create the JSON content to encrypt w/ Chef Vault
  vault_file ="#{File.dirname(__FILE__)}/../data_bags/vault/#{account}_#{aws_data['user']}_unencrypted.json", 'w')
  vault_file.puts JSON.pretty_generate(
      'id' => "#{account}_#{aws_data['user']}",
      'data' => aws_access_keys,
      'search_query' => aws_data['search_query']
  # This would be loaded directly with Chef Vault if this were a knife plugin...
  puts <<-eoh data-blogger-escaped---admins="" data-blogger-escaped---json="" data-blogger-escaped---mode="" data-blogger-escaped---search="" data-blogger-escaped--="" data-blogger-escaped--sd="" data-blogger-escaped-account="" data-blogger-escaped-admins="" data-blogger-escaped-aws_data="" data-blogger-escaped-be="" data-blogger-escaped-client="" data-blogger-escaped-code="" data-blogger-escaped-create="" data-blogger-escaped-data_bags="" data-blogger-escaped-encrypt="" data-blogger-escaped-end="" data-blogger-escaped-eoh="" data-blogger-escaped-humans="" data-blogger-escaped-knife="" data-blogger-escaped-list="" data-blogger-escaped-of="" data-blogger-escaped-paste="" data-blogger-escaped-search_query="" data-blogger-escaped-should="" data-blogger-escaped-unencrypted.json="" data-blogger-escaped-user="" data-blogger-escaped-vault="" data-blogger-escaped-who="">
This is invoked with:
% ./create-iam.rb ./iam-json-data/filename.json
The script iterates over each of the AWS account credentials named in the accounts field of the JSON file named, and loads the credentials from the ~/.aws/config file. Then, it uses the aws-sdk Ruby library to authenticate a connection to AWS IAM API endpoint. This instance object, iam, then uses methods to work with the API to create the group, user, policy, etc. The policy comes from the JSON document as described above. It will create user access keys, and it writes these, along with some other metadata for Chef Vault to a new JSON file that will be loaded and encrypted with the knife encrypt plugin.

As described, it will display a command to copy/paste. This is technical debt, as it was easier than directly working with the Chef Vault API at the time :).

Using Knife Encrypt

After running the script, we have an unencrypted JSON file in the Chef repository's data_bags/vault directory, named for the user created, e.g., data_bags/vault/secret-files_unencrypted.json.
  "id": "secret-files",
  "data": {
    "aws_access_key_id": "the access key generated through the AWS API",
    "aws_secret_access_key": "the secret key generated through the AWS API"
  "search_query": "roles:secret-files-server"
The knife encrypt command is from the plugin that Chef Vault provides. The output of the create-iam.rb script outputs how to use this:
% knife encrypt create vault an-aws-account-name_secret-files \
  --search 'roles:secret-files-server' \
  --mode client \
  --json data_bags/vault/an-aws-account-name_secret-files_unencrypted.json \
  --admins "`knife user list | paste -sd ',' -`"


After running the create-iam.rb script with the example data file, and the unencrypted JSON output, we'll have the following:
  1. An IAM group in the AWS account named secret-files.
  2. An IAM user named secret-files added to the secret-files.
  3. Permission for the secret-files user to perform any S3 operations
    on the secret-files bucket (and files it contains).
  4. A Chef Data Bag Item named an-aws-account-name_secret-files in the vault Bag, which will have encrypted contents.
  5. All nodes matching the search roles:secret-files-server will be present as clients in the item an-aws-account-name_secret-files_keys (in the vault bag).
  6. All users who exist on the Chef Server will be admins in the an-aws-account-name_secret-files_keys item.
To view AWS access key data, use the knife decrypt command.
% knife decrypt vault secret-files data --mode client
    data: {"aws_access_key_id"=>"the key", "aws_secret_access_key"=>"the secret key"}
The way knife decrypt works is you give it the field of encrypted data to encrypt which is why the unencrypted JSON had a field named data created - so we could use that to access any of the encrypted data we wanted. Similarly, we could use search_query instead of data to get the search query used, in case we wanted to update the access list of nodes.

In a recipe, we use the chef-vault cookbook's chef_vault_item helper method to access the content:
require 'chef-vault'
aws = chef_vault_item('vault', 'an-aws-account_secret-files')['data']


I wrote this script to automate the creation of a few dozen IAM users across several AWS accounts. Unsurprisingly, it took longer to test the recipe code and access to AWS resources across the various Chef recipes, than it took to write the script and run it.

Hopefully this is useful for those who are using AWS and Chef, and were wondering how to manage IAM users. Since this is "done" I may or may not get around to releasing a knife plugin.

Sponsored by Puppet Labs