Developers need configuration management too!
SCM tools: devs, ops or devops?
This is a somewhat disjointed brain dump on the topic. It may not always follow a logical readable path.
Software Configuration Management tools are originally aimed at operations people. They are system admin tools, created to make it possible to deploy a specific set of software in a consistent manner on several machines. The first such system I saw was in the mid-90’s and would replace the Windows 3 program manager. You would see a Microsoft Word icon, but when you clicked on it you did not start Word, instead you started a program that would check if you had Word installed, and that it was the correct version, etc. If not, it would install Word while you went for coffee, and when you came back Word would be running.
Devops has also started using SCM systems the last few years, as it allows you to quickly and easily deploy the last version of your web application onto your servers.
But developers have generally not used configuration management to set up their development environment. In the Python web community there is Buildout, most popular amongst Plone people and also some Django people. Buildout is very similar to a Software Configuration Management system, but it’s not designed to be one, so it lacks the modules for generic operating system tasks. It’s used to set up development environments and deploy websites.
So maybe developers don’t need SCM systems? Well, as the title already tells you, I disagree. And I will explain why by giving some examples.
Example 1: TripleO’s devtest.sh
In my new job at Red Hat, working with TripleO there has been a constant frustration in getting a development environment up and running. This is partly because OpenStack is a complex system made up of many separate parts and just installing this is complex in itself. But it is also partly because the “canonical” way of getting a TripleO development environment running is to run a script called “devtest.sh”. And that doesn’t sound bad, but so far I have only been able to run it successfully once. And since it easily can take an hour or two to fail, trying to run it is a frustrating exercise in patience. And I am a very impatient person, so I fail.
The basic problem with the devtest.sh script is that it is a script. It knows how to install things. To make sure it doesn’t try to install something that isn’t already installed, it first uninstalls it. So if the script fails in the end of the setup, re-running it means deleting a lot of what was done, and then doing it again. Often it failed because a website was down, or because my DNS server didn’t answer correctly or fast enough. And each error would kill the whole process and require me to start over.
You can also run it step by step, but when doing that it is often unclear if the step finished correctly or not. So I only managed to finish it properly after I got the recommendation to first run it in a mode to build all images, and then run the script in a mode that only did the configuration and did not try to build the images. Even so, I had to set up a caching proxy and a local DNS server to avoid network problems, so the image-building could finish. It’s also worth mentioning that I don’t have network problems, really. Only devtest.sh would claim it couldn’t reach servers or look up names. I don’t know why it’s so brittle.
I should note that last week TripleO became installable with Instack, so the situation has reportedly improved, but I haven’t tried yet, because I’m afraid of touching what is now finally a working situation. But this serves as a problem of how bad it can be. It took me three months, and probably a week or two of actual work to get devtest.sh running. It would most likely have been faster to run each step manually, but the only description of how to do that was in the massive collection of scripts that goes by the name devtest.sh. And following what happens in those scripts, containing a lot of global environment variables, is a lot of work. I know, because I tried.
Example 2: Zope/Plone and Buildout
In the typical project I would be involved in when I worked at Nuxeo, we usually had several Zope servers, a ZEO (ZODB) server, some other database, a Lucene server for searching, and a load balancer. And you needed most of these things to be able to develop on the project. Getting started on a project was typically a matter of following instructions in one or several README files, often inaccurate or outdated. Setting up a project so you could start working on it took in average around a day.
Then Buildout arrived. Buildout worked so that you defined up each of the services you wanted, and then you just ran bin/buildout and it would set up the environment. With Buildout setting up an environment to get started all it took was a few command line commands and a coffee break. Or, in the case of really complex projects, a lunch break. OpenStack is not an order of magnitude more complex than a typical Plone setup, yet it is an order of magnitude (or two) easier to get a development environment up and running with Plone than with OpenStack. I think that’s a good indication that Plone is on the right path here.
Buildout is somewhat limited in scope, it’s a tool designed largely for Python and it also helped isolate the project from the rest of the world, so you could have several projects on your computer at the same time with different versions of Python modules. It therefore has special handling for Virtualenv, as it also does environment isolation for Python, and Setuptools, which it requires and does many magic things with. But when developing Python software, and especially Python software for web, it’s an amazing tool.
It allows you to install specific versions of software and modules (and easily change which version). But storing the Buildout configuration in a version management system you can also tag the configuration when you deploy it at a customer. That way you can also easily and quickly replicate the customer deployment which helps you replicate issues.
It also has grown a very useful toolset. As en example, mr.developer allows you to make a configuration in such a way that if you suddenly need to develop on a specific module, you can quickly switch from the released version to using a trunk checkout. This means that you can choose which parts of the project that should be in “development mode”. Having a big project like Plone itself in development mode makes your environment to unstable. Somebody will often check in a change in one module that breaks another module, and should you happen to update your checkouts then your whole environment is broken, even though that change did not directly affect you. You want most of your environment to run stable versions, and only run the development trunk of the modules you are directly working on. Mr.developer allows you to do that.
Example 3: TripleO’s “diskimage-builder”
For creating images to use in virtual machines, TripleO has diskimage-builder. It’s also similar to an SCM system as it actually creates virtual machines, and than installs software and configures these machines. It does this based on “elements”, so you can define what elements you want to have installed on your machine-image.
A bad thing with it is that it’s script based. Each element is a script, meaning that overriding and extending them is tricky. It also means that you may have problems in mixing and matching elements, as they might step in it’s does, by for example each providing their own configuration file for a software. I don’t think this is a big problem for diskimage-builder, because it’s use case is installing OpenStack and Red Hat’s OpenStack products. The risk of different elements stepping on each other is therefore small. And it’s always used to create images from scratch, so the start-environment is know, it’s always a cleanly installed new operating system. This makes the whole configuration issue simpler.
But it also has several good ideas. The first of these are the elements itself. They define up how to install and configure a specific piece of the puzzle. This can be the operating system, or a software, like for example glance. And what is also nice is that the elements may have definitions for different kinds of installs. The glance element for example contains both installs from packages and from source.
So what do we developers really need?
We need a generic Software Configuration Management tool that can install all the bits that are needed for a development environment.
Instead of script that are doing things in a specific order, no matter what, a good system would instead just have a description of the state we want to achieve, and make it so. This is both for reliability and speed.
Buildout, as mentioned in my previous blog post on SCM’s, will in fact not re-run a successfully configured part unless the configuration for it changed (or told to explicitly).
This means that each attempt of running the script does not become a 4-hour monster run that will fail after 3 hours because of a temporary network error. If there is a temporary error, we just run the tool again, and it picks up more or less where it left off. Changing a configuration would only re-run the parts affected by that change.
A module repository
Since a module written in C has completely different ways of building than one in Ruby or Python, and also different languages have different ways of declaring dependencies (if any at all) there needs to be support for package repositories that declare these things.
A repository would need to declare how the module is called in different types of Unices, so it can be installed both by apt, yum, nix or the other package managers for Linux. You also need to declare where the source code for different versions can be downloaded, and how to compile it, for those systems that do not have a package. Most likely we would want Buildout style “recipes” to handle common cases, like a recipe that can download code and run “configure, make, make install” on it, and another recipe for installing Python modules, and another for Ruby modules etc.
You want to be able to declare the exact versions to use, and you want this to be done in a separate file. This is so that you can pin versions for deployment so that you can later easily replicate your customers deployment when looking at fixing bug reports.
Multiple version installs
As a developer, we want to be able to have several versions installed at the same time. This is easy with some software, and hard with other software. A developer centric SCM probably needs to integrate Nix and Docker to be able to isolate some things from the operating system. See also Domen Kozar’s blog on the topic.
Stable or development mode per module
A standard development build should probably install the latest released version of everything by default. In an OpenStack context that would mean that you install whatever comes with your operating system for the most cases. On Ubuntu it would install OpenStack with apt-get and on Fedora with yum.
But then you should be able to say that you want to get a specific piece, for example the Nova service, in development mode, and re-running the configuration after that change would remove or stop the OS-installed Nova, checkout Nova master, and build it, but probably not start it, as you would likely want to run it from a terminal so you could debug it.
When changing a module to deployment mode, the version pinning should of course be ignored. And if the modules checkout declares a different version than what is pinned, you should get an error.
Online and offline modes
You don’t always want your machines to reach out to the internet to install. That means that there needs to be a way to package both the configuration and all dependent files in a package that can be transferred to the target computer for installing.
What we probably do not want
We don’t want complicated ways to run installation over SSH. I don’t know why almost every SCM system seems to implement it’s own version of this. It’s probably better that the SCM system installs things locally, and that you use a separate tool to deploy different configurations to different machines. Fabric seems to be a good tool for this.
Does this already exist?
I don’t know. You tell me.