Log in

Fri, May. 18th, 2012, 06:18 pm
Infochimps: How We Do It

I did an Ignite talk at DevOpsDays Austin on the culture that makes Infochimps work, and they asked me to expand into a blog post.

Infochimps uses many cutting edge tools (Chef, Amazon Web Services, Hadoop, Hbase, ElasticSearch, Flume, MongoDB, Phantom.js, etc. ad nauseum), and we’ve written a number of custom tools to help corral these sometimes wild horses into a working team. Ironfan, our Chef specialization for big-data in the cloud, coordinates the installation and configuration of the many necessary components. Wukong is our Ruby library for Hadoop, combining the flexibility of JRuby with the raw power of MapReduce. Wonderdog is our Hadoop interface to ElasticSearch, allowing us to deliver large amounts of data quickly into a stable and searchable NoSQL data stores. Swineherd, the workflow engine for Hadoop jobs, helps tie all of this together into a coherent framework for running multi-stage data ingestions.

To crib a DevOps aphorism, however, it’s not the technology that makes Infochimps work: it’s the culture. Specifically, it’s about culture that keeps the challenges from all that novel technology manageable.

Our hiring process is a big part of building and maintaining that culture. We have multiple interview passes, to efficiently separate the few who will fit from the large mass of potential hires. The first pass is with our office manager Holly, a sweet lady who weeds out obvious mismatches in personality, interest, or resume. Next is a phone interview with Adam, our technical team lead, to help weed out those with obviously insufficient skill-sets. After that comes a team interview at the office, to do a finer test on the cultural fit, and start sniffing out where the candidate’s skills and interests are. The last hurdle before an offer is a short initial contract job (a week or two long, paid on completion); nothing demonstrates work ethic and development style clearer than actual development work. Although there are two technical passes, in all cases the focus is less on existing experience, and more on attitude and potential: even the most experienced candidate will lack experience with most of our tools, so adaptability and initiative are important traits in a successful hire.

Our management style relies heavily on what we have won in the hiring process: a work environment full of capable, intelligent, and self-motivated people. Management structure is very flat; I regularly consult with the C-level folks, and everyone else, as a particular task requires. Well-defined (but flexible) roles help keep communication open, as it’s usually obvious who should be included in a discussion. The technical leadership is focused on setting and tying together goals, leaving most choices about the implementation to those doing the work, but always available to clarify what choices align best with the bigger picture. Shared language for common pains and frustrations (e.g. spending currency as an analogy for causing developer frustration, or our various terms for types of technical debt) help encourage empathy, and a shared focus on troubleshooting over blame-assignment. Above all, management strives to avoid mandatory overhead for development (i.e. regular status reports and meetings), instead relying on each employee’s good judgment and occasional casual check-ins to decide how they communicate status and needs.

Beyond core management style, Infochimps goes to great lengths to support their employees. Developers aren’t always the best at remembering to eat in the middle of deep code delve, so lunches (and gentle reminders from Holly) are supplied, in addition to a fully stocked kitchen. There’s an employee joy fund, which employees propose and vote on uses: past choices have ranged from “a new coffee maker that doesn’t suck” to bimonthly yoga classes. There are company outings, both formal and impromptu, and some fun and games around the office too (including the occasional Magic: the Gathering free-for-all).

Employee career development is another big key to employee joy. An employee’s focus is largely self-directed, with interest trumping experience in all but the most time- or stability-sensitive projects. To paraphrase Flip, one of our founders, our aim is to make employees awesomely valuable to the open job market, and totally disinterested in it.

Our development culture is heavily agile, embracing elements from Scrum, Kanban, and DevOps without slavish adherence to any of them. Though core technology choices often come from the C-level, good ideas can and do come from anywhere, from the newest hire to the office manager. In a similar way, although operations are my core responsibility, they are not mine alone: we are closer to the ideal of DevOps (or perhaps NoOps+1 or AllOps), in that ultimately everyone shares the goal (and some of the load) of keeping everything operational. Repeatability is key to many of our core products, but we balance with an understanding that automation is best done to address boredom or terror, not just inefficiency; a task must be either be too predictable to be interesting, or too complex to be feasible, to be a good reason to add further infrastructure. We are also consciously risk-taking, preferring failure from audacity to failure from inaction, and failing forward instead of rolling back wherever possible.

Our infrastructure choices are made with similar goals in mind: developer experience and ergonomics are important criteria for tool choice. Resources are open by default (in cultural assumption, where security concerns prevent it in actuality), so that developers may get to what they need to easily. Components should ideally be small, decoupled, and late-binding wherever possible; reducing the interdependence the system improves both how manageable it is, and how flexible your architecture can be in the face of changing business needs. Making infrastructure repeatable (by making it from code, via Ironfan) means that building anew is an attractive option, which can free you from some of the worst of legacy code upgrade cycles. Archiving unused code and data from production systems, as opposed to supporting everything without question, makes the resultant systems easier to understand and trust.

So now that we’ve got this great workplace, what’s next? We foresee (and are even starting to experience) some growing pains as we shift into our enterprise focused work. How do we handle the impedance mismatch between our model and our clients’ models? What do we do as the company grows beyond the size of the monkeysphere? How should we tackle user segmentation and security as we build our Platform out?

Ultimately, the answers boil down to the same thing we have been doing: find the best teammates we can, then tear down any barriers between them and being awesome.

Sun, May. 20th, 2012 11:03 am (UTC)