Skip to content

A script is not configuration.

September 17, 2014

I’ve been looking into Ansible lately, and have had some problems in explaining what I think is wrong with Ansible, so this blog post is an attempt to do that, by comparing it with Buildout. This may seem a bit strange, since they don’t really do the same thing, but I think it will make sense in the end. So hang in there.

Ansible

Ansible calls itself “software automation”, and this is correct, but it’s often presented as an SCM system, but in my opinion it is not. And the reason for this is that Ansible is based around writing scripts. Because these scripts are written in YAML, they superficially look like configuration but that is misleading. Ansible calls these YAML files “playbooks” which again indicates what you do with them: You play them. They have a start and finish, and then perform a set of actions, in the order that the actions are defined in the playbook. The actions are often of the form “ensure that software X is installed” or “make sure that the file Y contains the line Z”. But it doesn’t change the fact that they are actions performed in the order written in the files. Therefore these files are scripts, and not configuration. Configuration does not have a specific order, configuration you first parse, and then you access the configuration data arbitrarily. This is not what Ansible playbooks does.

And that’s fine. It’s not criticism against Ansible per se. Ansible mainly calls itself a system for automation. And it is, just like any sort of script files. Bash scripts are also designed for automation. Ansible just has a very complicated (as opposed to complex) script language. Thanks to the choice of YAML the language is very restricted and often tricky to use as you end up having to fight the syntax by a very liberal and exact application of quote characters.

However, they do state on their website that “Ansible features an state-driven resource model that describes the desired state of computer systems and services, not the paths to get them to this state. ” And that is really only partly true. If you only use modules that check for state, it is true for a reasonable definition of “True”. But a lot of the modules shipped with Ansible doesn’t do that. And more importantly, the state is not defined in configuration, the state is defined in a script. This leads to limitations, which we will talk about later.

You can add new commands to Ansible by writing modules. They can be written in any language, which sounds like a neat idea, but it means that the API for a module is passing in JSON data on stdin an returning it on stdout. This makes debugging painful, and it means writing new modules a pain. In addition to that, to write Python modules you have to add a specific line at the end of the file with a “star import”, that breaks PEP8 and also confuses some syntax aware Python editors.

Ansible also recommends a specific directory layout, with several folders, who all have a file called main.yml. That means your editor quickly ends up having a whole lot of main.yml open, and that gets confusing. My good friend Jörgen Modin called that kind of layout “A conspiracy on the hard disk” in reference to using Zope Toolkit style programming which does the same with it’s configure.zcml files. A file name should reflect what is in it, not what role it plays (unless that role is unique within one project).

For SCM you also need to have several layers of orthogonal modularity. You need to be able to define up the installation of for example MySQL, and then you need to define up what software should go onto each machine. Ansible can do this, although confusingly most people tend to use what Ansible calls “roles” to define up one component, and then you use the groups in the inventory file as roles. But that’s just naming, you’ll get used to that.

Buildout

Buildout calls itself a “software build system” and that’s not incorrect, but it makes it sound like it’s competing with make and scons, and it does not. In my opinion, Buildout is closer to being a Software Configuration Management system than a build system. I would call it a Environment Configuration System as it’s mainly designed to set up development environments, although it can also be used to deploy software. It’s main shortfall to being a proper SCM is that it lacks modules to do common SCM tasks, such as installing system packages with yum and apt, and more problematically, it lacks support for running some bits, like for example yum and apt, as a superuser.

Buildout does not have any support for remote deployment, so you need to use a separate program like Fabric to run Buildout remotely.

Just like Ansible has modules you can use to create new commands, Buildout has recipes. I Ansible they can be written in any language, in Buildout they have to be written in Python. This perhaps lessens the appear to some people, but I do think the benefits are worth it. A Buildout recipe is just a Python module, like any other, and they can be made available on the Python Cheese Shop, in which case Buildout will download and install them when you run it.

Configuration, not scripting

The most important thing about Buildout for the purposes of this blog is that Buildout is configured entirely with configuration files, more specifically of the ConfigParser variety. INI-files in general have the benefit of being designed for configuration, and it’s extremely minimalist syntax means it never gets in the way. Buildout of course has to extend the syntax by allowing variable substitution, but that is also all it does. Everything written in the configuration is also a variable and can be used, so you only need to define one piece of information once.

It also means that a part of a Buildout configuration only needs to be run once, if it succeeds. It then has set up the configuration correctly, and subsequent runs can skip the parts that has succeeded, unless the configuration changes. It also means that it is, at least in theory, possible to write uninstallers, as you can record the state before the run.

The choice of INI-style syntax also means there is no inherent execution order to the configuration. The configuration instead is split up into what Buildout calls “parts”, each part executed by a recipe given in the part. Here are two examples of parts. The first one will download, compile and install nginx locally (in the buildout directory). The second will generate an nginx configuration file from a template.

[nginx]
recipe = zc.recipe.cmmi
url = http://html-xslt.googlecode.com/files/nginx-0.7.67-html-xslt-4.tar.gz
# The SSI bug was fixed in nginx-0.7.65-html-xslt-2.tar.gz
extra_options =
    --conf-path=${buildout:directory}/etc/nginx.conf
    --sbin-path=${buildout:directory}/bin
    --error-log-path=${buildout:directory}/var/log/nginx-error.log
    --http-log-path=${buildout:directory}/var/log/nginx-access.log
    --pid-path=${buildout:directory}/var/nginx.pid
    --lock-path=${buildout:directory}/var/nginx.lock

[nginx.conf]
recipe = collective.recipe.template
port = 8000
root = ${buildout:directory}/examples
input = ${buildout:directory}/templates/nginx.conf.in
output = ${buildout:directory}/etc/nginx.conf

What makes this configuration and not a script is that none of this is executed unless the part is listed in a separate configuration:

[buildout]
parts = nginx
        nginx.conf

Buildout configuration files can also extend other files. So if we save the above in a file called base.cfg, we can then create another configuration file:

[buildout]
extends = base.cfg

[nginx.conf]
port = 8080

The only difference between the base.cfg and this file is that nginx will run on another port. This makes it easy for me to checkout a development environment and then make my own configuration file that just overrides the bits I need to change. Because it’s all configuration. With Ansible I would have to make the port into a variable and pass a new value in when I run Ansible. And that means that when writing Ansible playbooks, to make them proper configuration management and not scripts, everything must be a variable. Buildout avoids that issue by not having the intermediary step of Playbooks, but just having recipes, and configuration.

With buildout your configuration can also extend another configuration file, and add, skip or insert parts as you like.

[buildout]
extends = base.cfg
parts = nginx.config
        loadbalancer
        nginx

[loadbalancer]
...

Each part remains the same, but the order is different and there is a new one in the middle. In general, because all configuration is parsed and gathered before the parts are run, it doesn’t matter much which order you run them in, but in some cases they of course do. If you are going to not just configure software, but also start it, obviously you have to install and configure it first, to take an obvious example.

There is also a special syntax for adding and removing values from a configuration like the parts-definition:

[buildout]
parts += newpart1 newpart2
develop -= foo.bar

A Buildout example

Buildout is often used to set up both development, staging and production environments. I have an example where I have a base.cfg that only installs the Plone CMS. I then have a production.cfg which also sets up load balancing and caching, a supervisord to run the services, a cronjob to start the services on reboot, and cronjobs to do backups and database maintenance. My staging.cfg extends the production configuration only to change the ports, so that I can run the staging server on the same machine as the production server. The development.cfg also just extends base.cfg, so you don’t get any of the production services, but it instead adds loads of development tools. Lastly there is a version.cfg which contains version numbers for everything to be installed, so you know that if you set up a local environment to test a problem, you are using the same software as production.

If you were aiming to deploy this onto several servers, and have the database on one server and caching and loadbalancing on one, and the CMS instances on separate servers, then you would make a configuration file per server-type, and use that.

Buildout extensions

Buildout has a final level of indirection, it has extensions. Examples of extensions that are available are buildout.dumppickedversions (although it’s now a part of Buildout itself) that would list all Python packages that you has not given a specific version number for. Another is called mr.developer, which gives you commands to make sure that the Python packages that you are working on are all synced to the versioning system. It can even allow you to switch between the latest release and a checked out development version of a Python package, which is really neat.

Perhaps it would possible to make an extension which will allow you to run some Buildout parts as root and other as a normal used, and I’m willing to give implementing it a try, but I’m a bit busy at the moment, so it will have to wait. And if you can’t write an extension like that, adding that feature should be relatively easy. And with that feature, I would be prepared to call Buildout a Software Configuration Management system. It may be originally developed to manage only development environments, but it has proven itself capable of much more, and it certainly has done the most important design decisions in SCM correct. Hopefully the above text will clarify why.

From → python

13 Comments
  1. Ionel Maries permalink

    Is `salt` any better in this regard? Would be nice if you’d compare all the server automation systems.

    Did you switch to Buildout+Fabric in the end?

    • I haven’t used Salt, partly because when I looked at them at the beginning Salt seemed very similar, but clearly worse in most differences. This may not be actually true. But it uses Yaml, and it uses it as an imperative scripting Langauge, so yes, much of the criticism applies.

      I haven’t switched to Buildout (and don’t actually do server deployment in what I’m doing right now, so Fabric is not needed) and I haven’t tried other options like Puppet and Chef, because it takes several weeks of free time to try each.😉

  2. I agree with you that Ansible modules are far from ideal but regarding where config is put it seems to me that you are missing some key elements like host_vars, group_vars and the inventory file. I’ve been using Buildout since the time it was adopted by Plone. Now i’m slowingly finding my way in doing deployments with Ansible and i’m mostly happy with it. I put config parameters in yaml structures ( which can be more complex than the simple k=v of ini files) inside host_vars and group_vars and code “recipes” inside roles, one for each “service” or “state” that the deployment environment must serve or assume, using also roles dependencies to split the code. Then how the hosts are grouped in the inventory decides how configuration parameters are inherited by single hosts and the plays (in the playbook) associate multiple roles to an host. There is also the powerful chance to split groups into subgroups at runtime by evaluating “facts”, variables that come from the target deployment environment and allow for example to differentiate the installation of stuff in rpm /deb based distro.

    Regarding source management Ansible has modules for every major VCS, and pulling the right version of your code is just as simple if not better. You can for example record the execution of git into a variable and if the repo is “changed” (i.e. git has pulled some new commits) you can decide/code if the installation/update has to perform other tasks. That variable becomes “global” and associated with the host, so you can use it in all the other tasks/roles that need that info, play-wide.

    just my 2 cents

    • Right, but to then have a reasonable structure you have to put pretty much everything into these files as variables. And then the Playbooks end up being just a thin wrapper to pass the variables into the Modules. This is probably the most reasonable way to use Ansible, but it kinda makes you wonder what use the Playbooks are, really. Buildout goes directly from the configuration to the Recipes/modules and this is to me clearly the more sane way.

      I think you missed the point about mr.developer. The point is not that you can checkout code. The point is that you can install your environment with the latest released modules, and then if you end up having to modify the module, you can with just a command switch to a git checkout instead. And then back. This without modifying any configuration. (Only for Python, though).

      But I’ll talk about that more in a later blog post about developers and SCM. Here it was just used as an example of an extension.

  3. “Configuration does not have a specific order, configuration you first parse, and then you access the configuration data arbitrarily.”

    This is what I call the quantum mechanics definition of configuration. You describe your system, feed it to your CM program, and then, viola, your system matches the state.

    The trouble with this approach is that it’s not the way configuration takes place in the real world. There, there are dependencies and other things that require non-arbitrary ordering. Like many people, I often had the problem that I’d have to run Puppet more than once to accomplish what I was trying to do. This wasn’t Puppet’s fault – it my entirely my fault. But, by the time I figured out all the ordering dependencies, and added the necessary statements to Puppet, what I had left more closely resembled what you’re calling a script.

    The fact that Ansible, and other CM programs, run sequentially in a script-like manner is inevitable. In the case of Ansible, you’re aware of this when you create your playbooks so you don’t have to spend any time adjusting quantum mechanical descriptions.

    Jon Forrest

    • I was told on Twitter that I should make a blog post on the difference between descriptive and imperative, and maybe I’ll get time to do that as well. But here goes:

      Turing-machines are sequential. And although modern computers are not that simple, this is the abstraction that we develop against. For that reason, anything you type in will sooner or later become a sequence of commands that are executed by the processor. But for that reason saying that there is no difference between a non-sequential configuration, and a sequential script, is just redefining the words, and redefining them so that they become essentially meaningless. There is here, IMO an important distinction between a script and configuration, and I tried to illustrate that difference in the blog post in a practical way.

      Buildout also has an order. The “parts=” statement is ordered, and the parts are executed in that order. And an SCM must install the software before it can run it, etc. But this does not make it a script. Soe of the differences I explain above. You can with configuration extend it and change one part of the configuration without having top copy/paste the whole section, or replace parts of it with an externally definable variable. This because essentially configuration is ONLY externally definable variables.

      And before you argue that the order of the parts makes buildout a script, remember that the order is ALSO configuration. It’s not a part of the syntax as it is in Ansible. It’s also an externally definable variable and can be overridden.

      In the end it all comes down to a separation of code and data. It’s no different than having a document of some sort, software that can display that document. The display is done through sequential commands by the processor. But that does not the document format a programming language. It doesn’t need to be sequential either, XML for example is not.

      Another relevant comparison is between imperative and functional languages. In the end, the functional language is going to be run in sequence, but that does not mean that there is no such thing as functional languages.

  4. (En sak att komma ihag – pa engelska det finns en stor skillnad mellan “it’s” och “its”. Du har det fel pa flera stallen.[Jag har inte ett svenskt keyboard])

    Anyway, “But for that reason saying that there is no difference between a non-sequential configuration, and a sequential script, is just redefining the words, and redefining them so that they become essentially meaningless.”. Exactly. My point is that this difference does exist, and it’s the reason why CM systems (I’m only talking about CM systems) eventually have to map between the idealized system description and the steps necessary to make it so. I agree that if the system description is fundamentally wrong then this mapping will be wrong also.

    Whether the language used by a CM system is a programming language is another question – one that I’m making no attempt to answer. My only claim is that at some point it has to be executed sequentially, taking into consideration the dependencies inherent in what’s being done.

    Jon Forrest

    • Spelling, schmelling.🙂

      No, as pointed out, I agree that even non-sequential configuration ends up as sequential changes. But that doesn’t mean it’s the same thing. The difference is still important, just as there is a difference between using Python and using Forth, even though they both end up as machine code in the end.

  5. I personally don’t see any problems with actually having scripts doing this kind of jobs, we’ll probably see a lot more similar stuff when containers get even hotter than they are today. I personally use ansible, but it doesn’t work against live servers. Ansible is one of the tools I utilize to create server images, which is then deployed to the cloud and run using docker.

    • I don’t see a problem either. But I also don’t see a problem writing webapps in C++. I just don’t think it’s the best tool for the job.

  6. santagada permalink

    Reasons why I think buildout has never gotten a foothold on the python/deploy ecosystem:
    – INI files are completely flat and buildout does a lot of confusing work to make it work (e.g. “parts” is just a key in a specific section of the ini file which happens to work as a root for the config).
    – The ordering of parts not be the order of the file is also confusing in itself.
    – All of buildout rules are about building things from source on the local directory (at least the last time I used it in 2007).
    – Sometimes it just blows up and you have to rebuild things (ok maybe this has gotten better). It has a small user base of mostly people building plone that’s why I think maybe it hasn’t gotten much better.
    – Does it still uses eggs for extensions?
    – There is a huge community around Ansible and it appears that when you have a problem someone already fixed it.
    – Needs 20 different extensions that everyone that already uses it in plone knows about that are not part of buildout (I know there is one for running bash scripts that you define uglily on the ini file).

    In the end, even though you might be right about configuration using ansible seems like a better Idea. Like using wordpress for a blog instead of plone😛. I wish there was a better answer for CM and I’m waiting to see your answers for the deficiencies that I remember from buildout… maybe you can convince me.

    • Buildout was never designed for generic configuration management, it’s designed for making development environments and deploying them in an isolated way, so you are right in much of your criticism.

      I’m very surprised that you think that it’s confusing that the order of the configuration files are not the order of execution, as it essentially says that you think Buildout’s configuration files are scripts. If you think if them that way, Buildout is surely confusing, but I don’t know what would lead to you make such an assumption.

Trackbacks & Pingbacks

  1. Developers need configuration management too! | Lennart Regebro: Python, Plone, Web

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: