Container Automation: Building a Brickyard

(This article was originally published on The Agile Admin, as part of their "Docker and the Future of Configuration Management" series.)

My name is Nathaniel Eliot, and I’ve worked extensively in software deployment over the last several years. I have worked on two automation frameworks around Chef: Ironfan, an open-source cluster management system from Infochimps, and Elzar, a (sadly closed-source) blue-green framework based on Spiceweasel. I currently work at Bazaarvoice, where I’m building out a installation.

Barnyard Blues

There is a catch-phrase in DevOps: “cattle, not pets”. It’s intended to describe the step forward that configuration management (CM, e.g. Chef, Puppet, Ansible, Salt, etc.) tools provide. Instead of building and maintaining systems by hand, DevOps-savvy engineers aim to build them via automated, repeatable systems. This has revolutionized system deployment, and resulted in vast improvements in the stability and manageability of large, complicated systems.

But while cattle are an important part of civilizing your software, they have drawbacks. As anybody who’s worked a farm will tell you, cattle management is hard work. Long-lived systems (which most barnyard-style deployments still are) decay with age, as their surrounding environment changes, and as faults are triggered in software components; upgrades to fix these issues can be complicated and fragile. New hosts for these systems can also suffer from unintended evolution, as external resources referenced by the CM build process change. System builds are often lengthy affairs, and often heavily intertwined, such that singular failures can block updates on unrelated resources.

These issues mean that failures are often addressed by “reaching into the cow”: SSH logins to affected hosts. As the phrasing implies, this should be considered a little gross. Your team’s collective understanding of a system is based on it being build in predictable ways from visible source code: an SSH login undermines that understanding.

Building a Brickyard

The phrase I like for container automation (CA, e.g. Flynn, Mesos+Docker, etc.) is “brickyard, not barnyard”. Bricks are more uniform, quicker to make, and easier to transport than cows: CA provides greater immutability of product, faster cycle time, and easier migration than CM.

Because everything is baked to the base image, the danger of environmental changes altering or breaking your existing architecture is far lower. Instead, those changes break things during the build step, which is decoupled from the deployment itself. If you expand this immutability by providing architecturally identical container hosts, your code is also less vulnerable to “works in dev” issues, where special development configuration is lacking on production machines.

Rapid cycle time is the next great advantage that CA provides, and arguably the largest from a business perspective. By simplifying and automating build and deployment processes, CA encourages developers to commit and test regularly. This improves both development velocity and MTTR (mean time to repair), by providing safe and simple ways to test, deploy, and roll back changes. Ultimately, a brick is less work to produce than a fully functioning cow.

Because CA produces immutable results, those results can easily be transported. The underlying CA tools must be installed in the new environment, but the resulting platform looks the same to the images started on it. This gives you a flexibility in migration and deployment that may be harder to achieve in the CM world.

These benefits are theoretically achievable with configuration management; Ironfan is a good example of many of these principles at work in the CM world. However, they aren’t first class goals of the underlying tools, and so systems that achieve them do so by amalgamating a larger collection of more generic tools. Each of those tools makes choices based on the more generic set of situations it’s in, and the net result is a lot of integration pain and fragility.

Bricks or Burgers

So when should you use CM, and when should you use CA? You can’t eat bricks, and you can’t make skyscrapers from beef; obviously there are trade-offs.

Configuration management works best at smoothing the gaps between the manually deployed world that most of our software was designed in, and the fully automated world we’re inching toward. It can automate pretty much any installation you can do from a command line, handling the wide array of configuration options and install requirements that various legacy software packages expect.

Container automation currently works best for microservices: 12-factor applications that you own the code for. In existing architectures, those often live either in overly spacious (and fallable) single servers, or in messy shared systems that become managerial black-holes. This makes them an easy first target, providing greater stability, management, and isolation than their existing setups.

However, that’s as things stand currently. Civilization may depend on both, and the cattle came first, but ultimately it’s easier to build with bricks. As frameworks like Flynn expand their features (adding volume management, deploy pipelines, etc), and as their users build experience with more ambitious uses, I believe CM is slowly going to be trumped (or absorbed) by the better CA frameworks out there.


Dear John Donahue, CEO of eBay

I received in my inbox this morning a passionate plea from you:

Dear temujin9,

As an online shopper, you may be negatively affected by sales tax legislation currently being considered in the U.S. Congress. I want to let you know about the bill and give you a . . .

::snip for SPAM and whinging::

I know crowd-sourcing is all the rage these days, but can't you afford traditional lobbyists to buy your privilege, like all the other sleazy millionaires?


The Guy Who Will Buy Second-Hand Crap Elsewhere, Next Time

Enterprise Software Fallacy #1: "Keeping code secret preserves competitive advantage"

This was once taken as proven fact, and it's still a core of many software company's money-making plans, even those which produce large and respected open source products. The theory goes that people will pay for those extra bits, but not if we make it completely free and accessible. When it was ubiquitous, that was true, because there were no good alternatives; open-source also-rans now fence it in, because 60% of the features at 100% discount (ops S&H not included) is an attractive price for the less important software.

In reality, the company doing (the majority of) the code development will always have a strong competitive advantage with that code. The only thing that can lose you that is to lose the people writing the code. Because the competitive advantage of features is not their existence, but the company's ability to capitalize on them in one form or another, and few people have the time to learn what your code can do better than you do.

Even if those people are outside your company, they're far more likely to ally themselves with you to gain yet more insight, than to attack you in the marketplace. To compete with you they would need to educate those they work with to at least your developers level of experience, and build a convincing market case that they're better at your code than you. It's an uphill battle with an entrenched incumbent, and there are only a few folks crazy enough to do those. What pretty much all those crazy folks have in common was a) they were working on an open source product, and b) that product's owner pissed them off, hard enough to goad them into that hopeless battle. This is a "hostile fork", and it's what scares C-level folks into the default fallacy.

But every hostile fork has a quieter, but just as deadly, closed-source alternative: "abandonware". If you piss off the folks who write your code bad enough to pick hopeless battles, one of the likely responses is going to be "well, fuck this job". And that has a bad way of cascading: as coders leave, their tasks get dumped on the remaining seniors and the poorly educated newbies, and the pressure usually increases. Since good coders are always in demand, the good coders can afford to be more easily offended, and will often be the first to flip the table, which only accelerates this whole process (as they're usually both well respected, and doing the difficult tasks others can't). And once the good parts of the team are out the door, the whole code-base starts to rot (a process which deserves its own article).

When this happens in the open source world, it can sometimes produce a true dangerous competitor: your former development team, forking their own code under a new banner. The MySQL "Baby Bells" (Drizzle, Percona, MariaDB), created in the wake of the Oracle-Sun-MySQL purchases, are fine examples. Non-compete agreements can be a band-aid, but most coders are wise enough not to sign long ones (and again, its the smart ones you care about). But this is true of closed source software, too: your angry team of ex-developers could decide to write a you-killer application from scratch, and if they're good they can do it in their spare time, as they wait out the non-compete clock doing consulting or short-term contracts. They might have a rougher time of it, but at the same time they're also not bound to your crappy legacy code.

An established code base is an advantage, to be sure. But the real advantage is in the understanding of what can be done with it, not in the raw lines of code (or compiled binaries). That knowledge lives in the heads of your employees (developers or otherwise), and the only way to keep it in the company is to retain employees. Communication and education can help on-board new people, but is no substitute for keeping good experience around. Keeping a strong team happy preserves competitive advantage, in ways that keeping code secret never will.

A Brief Primer on Start-ups

(Originally posted on the Less Wrong thread "Who Wants To Start An Important Startup?".)

I work for a start-up, and I've worked for a number of them over the years. While it's been some of the best and most fulfilling work I've done, there are several things you need to consider.

1) Real start-ups (as opposed to ordinary new businesses) are a strange kind of betting game. They are long-shots that pay off extremely well if they hit, such that investors can afford to fund a hundred of them to get five that survive and one that hits big. This is a very different economic landscape from the one that your average job exists in, and many of your hard won beliefs about how a business works will be *wrong*.

2) The pay-off is pretty much all at the end, when you (maybe) hit big, or (maybe) get bought in a tech-and-talent acquisition, or (probably) get another higher paying job on the back of all the experience you've acquired. Terms like "ramen profitable" and "remaining runway" should give you a feel for the high-risk, high-stress, and questionable reward landscape that you're entering. This isn't easy, and it's hardest at the beginning, before you get customers and traction. It's also difficult on folks with family: the combination of low money, high stress, and long hours can be hard on life outside of work.

3) Remember what I said about "long-shot bets"? Your venture is probably going to fail, or everyone would already be doing something like it. Have a parachute handy, and a backup chute.

4) You will be doing work you weren't prepared for. No matter how large your comfort zone is, a good start-up will try to push you outside of it. You will agree to (or be tricked into) things you find you cannot do well, and so will everyone else around you. Getting good at handling failure (yours and others) is the only way you'll survive in the job long enough to see that pay-out.

5) There is kool-aid, and you have to drink at least a little. Take small sips, and above all, learn how to make the mix more palatable. Yes, this will mean dealing with (at times) people who are more or less completely irrational. You're doing something good for the world, and sometimes that means getting your hands dirty.

Further reading: Paul Graham, founder of Y-Combinator has lots of well-written ideas about (among other things) start-ups. There are undoubtedly others, but none leap to mind quite so readily.

(That all said: I'm a start-up addict. I will probably be working them, in some form or another, until I die. So if your start-up needs heavy back-end IT resource, I might be able to help you get tooled up . . . and if you want to work in an existing start-up with -- warning: kool-aid -- some of the hottest Big Data tools out there, let me know, because we're hiring.)

NDAA: Don't Buy The Hype

"The fact that I support this bill as a whole does not mean I agree with everything in it. In particular, I have signed this bill despite having serious reservations with certain provisions that regulate the detention, interrogation, and prosecution of suspected terrorists."

"[S]everal Republicans announced that they would seek to introduce military detention legislation; it was these detention provisions that ended up in the NDAA."

It's a fairly transparent political gambit. The Republicans attach the military detention provisions to a bill that is political suicide to veto (as it would have cut off funding to the military entirely), and then attempt to hang Obama in public opinion for signing the thing. (Obama, for his part, gave them the opening by justifying detaining terrorists at Guantanamo in an earlier executive order.)

Were I in his position, I'd be setting that part of the NDAA up for constitutional challenges, so that it can be overturned and provide precedence for future court rulings against similar things . . . :

"U.S. District Judge Katherine Forrest in Manhattan ruled that the law, passed as part of the National Defense Authorization Act for 2012, was unconstitutional."
  • Current Music
    Warren Zevon - Boom Boom Mancini
  • Tags

Dear Delicious

I started with you back when you were Then Yahoo bought you. Then they sold you. Then you started making bad changes: removing fully-functional interfaces and replacing them with cheap substitutes that often don't work as advertised.

Today, I tried to clean out some of my bookmarks. I have roughly 3500, and many are rotten, outdated, or otherwise useless to me now. But you won't let me delete multiple at a time (or rather, you'll let me try, and then fail silently). Then you started logging me out incessantly.

Now you've rate limited me. On completely manual, one at a time, selectively applied deletions, I have been rate limited. In less than a day.

I'd say "it's not you, it's me", but it's pretty obvious that it's you. I'm leaving you, and I'm taking the links. I hope I'll remember you as you once were, not the bloated tumblr clone you're trying to become.


Fuck Your Redesign
  • Current Music
    Paul Oakenfold - Conspiracy
  • Tags

Feathers Afire

A virtual stranger I know,
from a land outside of time,
lost his love in some fiasco.

In trying to comfort him,
in that shared pain
that your fallen saint so loved,
my time machine slides back
to side streams always open,
but never fully reached.

I speak to (and for) no spirits,
hold no love for the glowing aether:
give me real ground and sky instead.

And yet I find myself
sharing pain with you there,
and am shakingly glad to find
that your grace has,
if not a physical permanence,
an infectious quality.
  • Current Music
    The Herbaliser - Geddim'!!
  • Tags

Sending Off A Little Light

My daughter Iliana was born into crisis. Her continued good-cheer in the center of that kept many of us relatively sane, in the midst of her mother's decline. She's been a tiny anchor, keeping me from forgetting my place in the world when a part of me wanted nothing more than to cut adrift.

But she's also exhausting, as only a very aware one-year-old could be. She runs Erika ragged during the day, and then has enough left to wear me out in the hour I'm usually home before her own bed-time. Between then and her wake-up I try to push everything that isn't work.

And I can see the warning signs of burnout in myself, plain as day. My spaces are slowly filling up with kipple, the physical trace of tasks unfinished and hung up to wait. My computer piles up half-finished reading and other task lists, some better attended than others but all slowly filling, rather than emptying. And my concentration on work is sheer force of will, in a job that I have been working my entire adult life to get to. Tardiness, my old nemesis, stalks my schedule-book like a Jack-the-Rip-Off, killing plans in the most banal of manners. Sleep comes less and less regularly, and anger comes more easily.

And my son gets along as best he can with the limited time and energy his dad has left for him, after work and wee one.

The month that Iliana went visiting grandparents was an immense relief, but tinged with guilt and anticipation. So I thought for a while. And I went to Flipside, and talked with Reesa a little (I did say relatively sane). And the conclusion I reached is this: I owe both of my children full-time parenting, and I'm giving them each barely half-time as it is. Erika is filling in the gaps admirably, but (and this is praising by faintness of insult) she's not Reesa; we don't have that rapport, nor her years of deep thinking and research on child-rearing. And I'm running out of steam, and that isn't something that getting more local help could address because I'm already getting plenty. And if I stop the whole damn works stops, and neither of them get the parenting they deserve.

So I've sent Iliana to live with Reesa's mother Deb, up near Dallas. This puts her very near a lot of her relatives, including her uncle Derek and her grandparents Ken and Mary. Deb and I have a strong rapport, and I think between us we can do a good job of remembering all the good things Reesa wanted for Iliana. Dylan and I will remain here in the Austin area while he's finishing high-school, and hopefully between us remembering all the good things Reesa wanted for him. There will be regular visits, and more regular video calls. When Dylan's out in the world on his own, Deb and I will be talking about living closer together and sharing in Iliana's care (she insisted, and I grok completely).

I'm beaten up. I know this was the right decision, but it hurts in ways I didn't think I had any capacity left for. I spent a month agonizing over it every stray second. I screamed and got religion at a dying pyre over it. I made my mind up, fought myself until I was bloodied and bloody sure. I'm crying now, I've been crying quietly for a week or more, and I. Don't. Cry.

Except to raging at the passing of a light. Even if this little one's just going a short way up the road.

Infochimps: How We Do It

I did an Ignite talk at DevOpsDays Austin on the culture that makes Infochimps work, and they asked me to expand into a blog post.

Infochimps uses many cutting edge tools (Chef, Amazon Web Services, Hadoop, Hbase, ElasticSearch, Flume, MongoDB, Phantom.js, etc. ad nauseum), and we’ve written a number of custom tools to help corral these sometimes wild horses into a working team. Ironfan, our Chef specialization for big-data in the cloud, coordinates the installation and configuration of the many necessary components. Wukong is our Ruby library for Hadoop, combining the flexibility of JRuby with the raw power of MapReduce. Wonderdog is our Hadoop interface to ElasticSearch, allowing us to deliver large amounts of data quickly into a stable and searchable NoSQL data stores. Swineherd, the workflow engine for Hadoop jobs, helps tie all of this together into a coherent framework for running multi-stage data ingestions.

To crib a DevOps aphorism, however, it’s not the technology that makes Infochimps work: it’s the culture. Specifically, it’s about culture that keeps the challenges from all that novel technology manageable.

Our hiring process is a big part of building and maintaining that culture. We have multiple interview passes, to efficiently separate the few who will fit from the large mass of potential hires. The first pass is with our office manager Holly, a sweet lady who weeds out obvious mismatches in personality, interest, or resume. Next is a phone interview with Adam, our technical team lead, to help weed out those with obviously insufficient skill-sets. After that comes a team interview at the office, to do a finer test on the cultural fit, and start sniffing out where the candidate’s skills and interests are. The last hurdle before an offer is a short initial contract job (a week or two long, paid on completion); nothing demonstrates work ethic and development style clearer than actual development work. Although there are two technical passes, in all cases the focus is less on existing experience, and more on attitude and potential: even the most experienced candidate will lack experience with most of our tools, so adaptability and initiative are important traits in a successful hire.

Our management style relies heavily on what we have won in the hiring process: a work environment full of capable, intelligent, and self-motivated people. Management structure is very flat; I regularly consult with the C-level folks, and everyone else, as a particular task requires. Well-defined (but flexible) roles help keep communication open, as it’s usually obvious who should be included in a discussion. The technical leadership is focused on setting and tying together goals, leaving most choices about the implementation to those doing the work, but always available to clarify what choices align best with the bigger picture. Shared language for common pains and frustrations (e.g. spending currency as an analogy for causing developer frustration, or our various terms for types of technical debt) help encourage empathy, and a shared focus on troubleshooting over blame-assignment. Above all, management strives to avoid mandatory overhead for development (i.e. regular status reports and meetings), instead relying on each employee’s good judgment and occasional casual check-ins to decide how they communicate status and needs.

Beyond core management style, Infochimps goes to great lengths to support their employees. Developers aren’t always the best at remembering to eat in the middle of deep code delve, so lunches (and gentle reminders from Holly) are supplied, in addition to a fully stocked kitchen. There’s an employee joy fund, which employees propose and vote on uses: past choices have ranged from “a new coffee maker that doesn’t suck” to bimonthly yoga classes. There are company outings, both formal and impromptu, and some fun and games around the office too (including the occasional Magic: the Gathering free-for-all).

Employee career development is another big key to employee joy. An employee’s focus is largely self-directed, with interest trumping experience in all but the most time- or stability-sensitive projects. To paraphrase Flip, one of our founders, our aim is to make employees awesomely valuable to the open job market, and totally disinterested in it.

Our development culture is heavily agile, embracing elements from Scrum, Kanban, and DevOps without slavish adherence to any of them. Though core technology choices often come from the C-level, good ideas can and do come from anywhere, from the newest hire to the office manager. In a similar way, although operations are my core responsibility, they are not mine alone: we are closer to the ideal of DevOps (or perhaps NoOps+1 or AllOps), in that ultimately everyone shares the goal (and some of the load) of keeping everything operational. Repeatability is key to many of our core products, but we balance with an understanding that automation is best done to address boredom or terror, not just inefficiency; a task must be either be too predictable to be interesting, or too complex to be feasible, to be a good reason to add further infrastructure. We are also consciously risk-taking, preferring failure from audacity to failure from inaction, and failing forward instead of rolling back wherever possible.

Our infrastructure choices are made with similar goals in mind: developer experience and ergonomics are important criteria for tool choice. Resources are open by default (in cultural assumption, where security concerns prevent it in actuality), so that developers may get to what they need to easily. Components should ideally be small, decoupled, and late-binding wherever possible; reducing the interdependence the system improves both how manageable it is, and how flexible your architecture can be in the face of changing business needs. Making infrastructure repeatable (by making it from code, via Ironfan) means that building anew is an attractive option, which can free you from some of the worst of legacy code upgrade cycles. Archiving unused code and data from production systems, as opposed to supporting everything without question, makes the resultant systems easier to understand and trust.

So now that we’ve got this great workplace, what’s next? We foresee (and are even starting to experience) some growing pains as we shift into our enterprise focused work. How do we handle the impedance mismatch between our model and our clients’ models? What do we do as the company grows beyond the size of the monkeysphere? How should we tackle user segmentation and security as we build our Platform out?

Ultimately, the answers boil down to the same thing we have been doing: find the best teammates we can, then tear down any barriers between them and being awesome.
  • Current Mood
    accomplished accomplished
  • Tags

OpenStack: Timely As Ever

This post was originally written for Infochimps, as a response to a GigaOm editorial titled "Is It Too Late For OpenStack?"

Prognostication seems to be all the rage again this week. Allow me to polish the crystal ball, and take a look into our cloudy future.

The competition isn't between OpenStack vs. Eucalyptus or CloudStack. It's between Amazon's closed AWS API ecosystem and OpenStack's market-driven API process. On the one side is the grandfather of IaaS and several notable open-source descendants, all fulfilling the API contract that Amazon owns wholesale. On the other side are a coalition of large hardware manufacturers, hosting providers, and space agencies, working to define an API (and a core reference implementation) that provides a broader market for innovation among cloud providers. The ecosystem shift of CloudStack, from OpenStack to AWS, is certainly notable as a signal of the AWS API market's viability. Like Kia's choice not to enter the luxury car market, however, it says almost nothing about the market it opted out of.

Amazon isn't going to abandon AWS, but its development will be slower than the collective development on OpenStack. Because they're built as monolithic blocks with different internal interfaces, its open-source imitators will find it harder to share code, and Amazon will probably never open source more than a small percentage of their core code. The AWS API contract will remain largely immutable to outsiders, although Amazon is open to outside input, as the Eucalyptus agreement shows. The hosting ecosystem there will be strongly commodified, which is great for SMB projects, and less exciting for large enterprises and institutions.

OpenStack isn't going away, either. CloudStack's dramatically-timed exit from the standards process is a sign of that process heating up, not cooling down. There are still many big players with their irons in the fire, because they see the longer-term benefits of a market that allows differentiation. OpenStack's focus on API standards has given those players a common set of interfaces, each of which can be fulfilled by separate components; this lets companies focus their development on those parts they do best (be it computers, file stores, or networks), knowing their work will be compatible with others. Large enterprises and institutions are well placed to use the performance advantages this provides, and large hardware manufacturers and service providers will prefer the bigger profit margins they can achieve by selling specifically those advantages. The secondary tool ecosystem here will be a bit smaller, but where AWS-compatible API shims aren't available, and what it lacks in broad size it will make up in more enterprise-level features.

And in the end, tools like Fog (and Ironfan, built on Fog and Chef) will make the differences between the two markets one of taste, and reduce the lock-in to specific providers yet further. Users will be able to freely move between the AWS and OpenStack markets as their needs change, maybe even on a minute-to-minute basis.

So who wins? In a super-market full of good options, ultimately, everyone does.
  • Current Mood
    contemplative contemplative
  • Tags