Skip to content

A script is not configuration.

I’ve been looking into Ansible lately, and have had some problems in explaining what I think is wrong with Ansible, so this blog post is an attempt to do that, by comparing it with Buildout. This may seem a bit strange, since they don’t really do the same thing, but I think it will make sense in the end. So hang in there.

Ansible

Ansible calls itself “software automation”, and this is correct, but it’s often presented as an SCM system, but in my opinion it is not. And the reason for this is that Ansible is based around writing scripts. Because these scripts are written in YAML, they superficially look like configuration but that is misleading. Ansible calls these YAML files “playbooks” which again indicates what you do with them: You play them. They have a start and finish, and then perform a set of actions, in the order that the actions are defined in the playbook. The actions are often of the form “ensure that software X is installed” or “make sure that the file Y contains the line Z”. But it doesn’t change the fact that they are actions performed in the order written in the files. Therefore these files are scripts, and not configuration. Configuration does not have a specific order, configuration you first parse, and then you access the configuration data arbitrarily. This is not what Ansible playbooks does.

And that’s fine. It’s not criticism against Ansible per se. Ansible mainly calls itself a system for automation. And it is, just like any sort of script files. Bash scripts are also designed for automation. Ansible just has a very complicated (as opposed to complex) script language. Thanks to the choice of YAML the language is very restricted and often tricky to use as you end up having to fight the syntax by a very liberal and exact application of quote characters.

However, they do state on their website that “Ansible features an state-driven resource model that describes the desired state of computer systems and services, not the paths to get them to this state. ” And that is really only partly true. If you only use modules that check for state, it is true for a reasonable definition of “True”. But a lot of the modules shipped with Ansible doesn’t do that. And more importantly, the state is not defined in configuration, the state is defined in a script. This leads to limitations, which we will talk about later.

You can add new commands to Ansible by writing modules. They can be written in any language, which sounds like a neat idea, but it means that the API for a module is passing in JSON data on stdin an returning it on stdout. This makes debugging painful, and it means writing new modules a pain. In addition to that, to write Python modules you have to add a specific line at the end of the file with a “star import”, that breaks PEP8 and also confuses some syntax aware Python editors.

Ansible also recommends a specific directory layout, with several folders, who all have a file called main.yml. That means your editor quickly ends up having a whole lot of main.yml open, and that gets confusing. My good friend Jörgen Modin called that kind of layout “A conspiracy on the hard disk” in reference to using Zope Toolkit style programming which does the same with it’s configure.zcml files. A file name should reflect what is in it, not what role it plays (unless that role is unique within one project).

For SCM you also need to have several layers of orthogonal modularity. You need to be able to define up the installation of for example MySQL, and then you need to define up what software should go onto each machine. Ansible can do this, although confusingly most people tend to use what Ansible calls “roles” to define up one component, and then you use the groups in the inventory file as roles. But that’s just naming, you’ll get used to that.

Buildout

Buildout calls itself a “software build system” and that’s not incorrect, but it makes it sound like it’s competing with make and scons, and it does not. In my opinion, Buildout is closer to being a Software Configuration Management system than a build system. I would call it a Environment Configuration System as it’s mainly designed to set up development environments, although it can also be used to deploy software. It’s main shortfall to being a proper SCM is that it lacks modules to do common SCM tasks, such as installing system packages with yum and apt, and more problematically, it lacks support for running some bits, like for example yum and apt, as a superuser.

Buildout does not have any support for remote deployment, so you need to use a separate program like Fabric to run Buildout remotely.

Just like Ansible has modules you can use to create new commands, Buildout has recipes. I Ansible they can be written in any language, in Buildout they have to be written in Python. This perhaps lessens the appear to some people, but I do think the benefits are worth it. A Buildout recipe is just a Python module, like any other, and they can be made available on the Python Cheese Shop, in which case Buildout will download and install them when you run it.

The most important thing about Buildout for the purposes of this blog is that Buildout is configured entirely with configuration files, more specifically of the ConfigParser variety. INI-files in general have the benefit of being designed for configuration, and it’s extremely minimalist syntax means it never gets in the way. Buildout of course has to extend the syntax by allowing variable substitution, but that is also all it does. Everything written in the configuration is also a variable and can be used, so you only need to define one piece of information once.

The choice of INI-style syntax also means there is no inherent execution order to the configuration. The configuration instead is split up into what Buildout calls “parts”, each part executed by a recipe given in the part. Here are two examples of parts. The first one will download, compile and install nginx locally (in the buildout directory). The second will generate an nginx configuration file from a template.

[nginx]
recipe = zc.recipe.cmmi
url = http://html-xslt.googlecode.com/files/nginx-0.7.67-html-xslt-4.tar.gz
# The SSI bug was fixed in nginx-0.7.65-html-xslt-2.tar.gz
extra_options =
    --conf-path=${buildout:directory}/etc/nginx.conf
    --sbin-path=${buildout:directory}/bin
    --error-log-path=${buildout:directory}/var/log/nginx-error.log
    --http-log-path=${buildout:directory}/var/log/nginx-access.log
    --pid-path=${buildout:directory}/var/nginx.pid
    --lock-path=${buildout:directory}/var/nginx.lock

[nginx.conf]
recipe = collective.recipe.template
port = 8000
root = ${buildout:directory}/examples
input = ${buildout:directory}/templates/nginx.conf.in
output = ${buildout:directory}/etc/nginx.conf

What makes this configuration and not a script is that none of this is executed unless the part is listed in a separate configuration:

[buildout]
parts = nginx
        nginx.conf

Buildout configuration files can also extend other files. So if we save the above in a file called base.cfg, we can then create another configuration file:

[buildout]
extends = base.cfg

[nginx.conf]
port = 8080

The only difference between the base.cfg and this file is that nginx will run on another port. This makes it easy for me to checkout a development environment and then make my own configuration file that just overrides the bits I need to change. Because it’s all configuration. With Ansible I would have to make the port into a variable and pass a new value in when I run Ansible. And that means that when writing Ansible playbooks, to make them proper configuration management and not scripts, everything must be a variable. Buildout avoids that issue by not having the intermediary step of Playbooks, but just having recipes, and configuration.

With buildout your configuration can also extend another configuration file, and add, skip or insert parts as you like.

[buildout]
extends = base.cfg
parts = nginx.config
        loadbalancer
        nginx

[loadbalancer]
...

Each part remains the same, but the order is different and there is a new one in the middle. In general, because all configuration is parsed and gathered before the parts are run, it doesn’t matter much which order you run them in, but in some cases they of course do. If you are going to not just configure software, but also start it, obviously you have to install and configure it first, to take an obvious example.

There is also a special syntax for adding and removing values from a configuration like the parts-definition:

[buildout]
parts += newpart1 newpart2
develop -= foo.bar

A Buildout example

Buildout is often used to set up both development, staging and production environments. I have an example where I have a base.cfg that only installs the Plone CMS. I then have a production.cfg which also sets up load balancing and caching, a supervisord to run the services, a cronjob to start the services on reboot, and cronjobs to do backups and database maintenance. My staging.cfg extends the production configuration only to change the ports, so that I can run the staging server on the same machine as the production server. The development.cfg also just extends base.cfg, so you don’t get any of the production services, but it instead adds loads of development tools. Lastly there is a version.cfg which contains version numbers for everything to be installed, so you know that if you set up a local environment to test a problem, you are using the same software as production.

If you were aiming to deploy this onto several servers, and have the database on one server and caching and loadbalancing on one, and the CMS instances on separate servers, then you would make a configuration file per server-type, and use that.

Buildout extensions

Buildout has a final level of indirection, it has extensions. Examples of extensions that are available are buildout.dumppickedversions (although it’s now a part of Buildout itself) that would list all Python packages that you has not given a specific version number for. Another is called mr.developer, which gives you commands to make sure that the Python packages that you are working on are all synced to the versioning system. It can even allow you to switch between the latest release and a checked out development version of a Python package, which is really neat.

Perhaps it would possible to make an extension which will allow you to run some Buildout parts as root and other as a normal used, and I’m willing to give implementing it a try, but I’m a bit busy at the moment, so it will have to wait. And if you can’t write an extension like that, adding that feature should be relatively easy. And with that feature, I would be prepared to call Buildout a Software Configuration Management system. It may be originally developed to manage only development environments, but it has proven itself capable of much more, and it certainly has done the most important design decisions in SCM correct. Hopefully the above text will clarify why.

Thoughts on JavaScript frameworks.

Like it or not, JavaScript is the language that we have to use for front end development on the web. And that was fine when all we did was adding some dynamism to the site, perhaps a menu or a date picker. But now the whole web application is often using JavaScript, and whole bits of the page is replaced via Ajax, and everything is clickable and updates automatically, etc. And at that point just having everything in one big JavaScript file is not feasible. And splitting it up randomly into separate files hardly helps as you tend to end up with loads of globals anyway.

To write a full front end JavaScript app you need some sort of structure. The first step there was JavaScript libraries, where jQuery the last few years have pretty much taken over everything. Lately the new trend is frameworks. This can intuitively make sense, as the web is moving from adding a bit of JavaScript here and there, to writing the applications purely in JavaScript. And just as with libraries, there are infinite amounts of them, and it will probably take a few years until we can shake out some winners. Because with frameworks it’s likely to be more than one winner, as the requirements for frameworks are quite different for different people and different use cases.

My requirements on a JavaScript framework may be somewhat different than others, but this list is based on my experience so far, which admittedly has been neither extensive, not good. But these are things I’ve found lacking.

Library-ishness

The first part of a good JS framework is that it is usable as a library, or that at least a lot of parts of the framework can be used in a simple way without using all of the framework. This is required for many reasons. Firstly this helps you get started with the framework as you can easily use parts of it without drowning.

Today many web applications are written in back-end frameworks, like Ruby on Rails or Django, and then applying a JavaScript framework whose main usage is to write a web application, you end up having two competing frameworks that fight each other. Therefore a framework that is designed only to be used to write full applications in JavaScript is going to be a bad fit for a web application written in Django. This requirement is quite unique for JavaScript Application frameworks, as you are unlikely to start using a bit of Ruby-on-Rails in a J2EE app, for example.

This means that if the framework is not a good fit for slowly taking over existing applications, you end up not using it in existing applications, and then  you need to learn some other library or framework in that situation. That means you need to know at least two frameworks for doing the same thing, which is a waste of brain power. One framework should be enough.

Modularity

To be able to handle big applications you need to be able to make things modular, and you need a way to handle module dependencies. This is something JavaScript traditionally has been very bad at, and there are several efforts to do something about that. A framework needs to support this, and you need to be able to declare your dependencies. You should also preferably have some sort of name space handling so that not all modules that are required for your application is available everywhere, as that clutters the name space. Think about Python’s import statement here, for an example of what you want.

Something you also want is an ability to declare your dependencies after the application has loaded. This is so that you can create a pluggable application. This is again mainly a requirement for applications that run on a traditional server-side framework. These frameworks often have their own system for making things extensible and pluggable. So you might want to write a plugin for your server-side framework that uses a date input with a nice date selector, and then you need that plugin to be able to say to the JavaScript application that is needs a specific module. Ways of doing that that are not good enough includes having to edit a separate file that lists all your JavaScript dependencies, because that file may not be under your control, as it’s a part of the web application, and not a part of the plugin you are developing.

Legacy form handling

Related to this is that you want to be able to integrate the JavaScript frameworks form handling with the back-end applications form handling. Some JavaScript frameworks assume that you are submitting any form with JavaScript collecting the values from a form, and then making a POST in that JavaScript. But if your widgets assume this, they may not work in a normal form submit, because they may not have a real <input> tag to submit the value. That also makes it harder to use just bits of the framework. Either that, or the framework needs to take over any <form> tag that uses a framework widget by magic. But magic is usually a bad idea.

HTML is declarative, your JavaScript App should also be.

Lately a new attitude has shown up in JavaScript, and that’s using a more declarative style. Instead of having an initializing script that tags specific <input> tags as using a date-time widget, you set some sort of declaration on the tag, and the widget happens automatically. Although this is just a small step from having just a special class for this, and using jQuery to set all inputs with that class to use the widget, it’s a change in attitude. And it’s an important change, because if you in a JavaScript framework need to write a lot of JavaScript manipulation of the HTML you have broken much of the purpose and principle of HTML. HTML is declarative, and you should be able to think about it in that way, and not have to think about it as something that in the future needs to be modified.

So with declarative JS, instead of creating HTML and then writing code that manipulates this HTML you write some sort of self-contained modules that are applied automatically. Of course, in the end it’s still JavaScript that manipulates the DOM, but it’s now done in localized and contained ways. And if you think this is obvious, then you are probably writing your declarative JavaScript in that way already, so pat yourself on the back!

The less JavaScript the better

Quite often, both for libraries and frameworks, you end up writing a lot of JavaScript. This sounds reasonable, but it should not be. Not only because writing and developing JavaScript sucks, but also because the JavaScript and HTML lives in a very tight integration, and the HTML is often designed by somebody who often do not know JavaScript. So less JavaScript is necessary not only to save us poor Python programmers from the pain, but also to make it possible for designers and usability people to create designs and mockups.

This also ties into the declarative development above. In the ideal world a HTML developer should be able to add a bunch of JavaScripts to his HTML, and then just write HTML code to get most of the web application UI working. (S)he should be able to declare a list of tabs, and have it behave like a list of tabs, and the date-time widget should be declared only with HTML, and the framework should do the magic.

Example: Patterns

Patterns” is a library of JavaScript widgets and functionality that works entirely from a declarative style. It aims and to a large part succeeds to make it possible for HTML hackers to design a webapp. This is because the concept in fact comes from a designer, Cornelis Kolbach. With Patterns you for example make a HTML injection like this:

<a href="demos/frobber.html#content" class="pat-inject">Demo the frobber</a>
<section id="content">
  HTML to be replaced.
</section>

The “pat-inject” class means the <a> link will load the href, find the tag with the ID “content” and inject it in the current page at the content-id. All of the usage of Patterns is similarly free of Javascript, while the patterns themselves is of course written in JavaScript.

Using Patterns is very simple, and you can freely pick and choose what you use, because it is in fact not a framework at all, it’s a library. It’s also badly documented and setting up an environment where you can develop patterns was at least a few months ago exceedingly complicated. But it serves well as an example of what you can achieve with declarative JavaScript libraries.

Example: AngularJS

AngularJS is a framework, and it’s very frameworky. It also claims to apply a declarative attitude, and indeed, the “Hello World” example of AngularJS requires you only to create the main application module. But then it quickly breaks down.

If you want to have a pre-filled value in the input when you load the page, commonly done when you are editing something, then the examples tell you to do this with very imperative JavaScript. This seems very strange, especially considering that most inputs have a value attribute, but that’s just ignored. AngularJS also breaks pretty most of the requirements I have for a good framework, and I admit it, this whole post was prompted by me trying to use AngularJS. Trying to use a tags-widget written with AngularJS in a form led to a lot of pain, partly because AngularJS does assume you will submit the form with AngularJS, as explained above.

AndularJS is trying to be modular, and you can indeed write new declarative modules, by adding directives. But using them independently can be quite tricky, and the first problem is that you define AngularJS dependencies in the same line of code that you create the application. That’s equivalent to having to declare all your imports in the first lines of your Python application! There is no way to add a dependency once the main application has been created.

That means that you actually have to make your own system for declaring dependencies where your plugins can add their dependencies to a list to be used when initializing the main AngularJS module.That adds a layer of complexity and brittleness to your web application. In Horizon, the admin UI for Openstack, there are now at least two different ways of declaring your plugins AngularJS requiements, perhaps more. This adds “technical debt” and complexity for no good reason.

AngularJS is definitely not a framework where you can use bits and pieces, you have to drink the cool-aid. And this seems to be the general consensus; AngularJS is good for writing apps from scratch, but not for adding functionality to existing applications.

Conclusion

The conclusion is that writing frameworks is hard, and for JavaScript possibly even harder. I have over the weekend gotten several recommendations for other JavaScript frameworks and for me none of them seem to fit the bill. I look forward to being proven wrong.

Correction and further thoughts

I had been pointed to Polymer, but I saw it as a widget library, not a framework, and after reading more documentation, I still do. It looks similar to Patterns, but with better documentation, and widget animations. I hope to get a chance to use it. It may very well be that the perfect framework is a combination of Polymer as a widget library and one or several bits that provides only frameworky bits for when writing pure JS frontends.

You need some sort of module encapsulation, and you need to be able to declare dependencies and add dependencies in a modular way. Either adding them at any point in the app initialization, or you need multi-step initialization, so you first create the app, then add dependencies, then initialize it.

I suspect that you also might need some sort of model/view/template mechanism. Obviel could fit the bill there.

Interlude: My HTPC is dead, long live my HTPC!

7 years ago I started this blog. It started with a couple of posts on my experiences with making a Home Theatre Personal Computer, running Linux, because I’m crazy.

Once I gave up on trying to use it to watch TV, and bought a TV, things were better. (Disclaimer, XBMC has better TV support now, so maybe it doesn’t suck anymore). Once I gave up on having it play DVD’s and bought a Blue-Ray player, it was even better. I now use XBMC to play music and videos, I use a web-browser to watch football live from streaming websites. And I use it for Skype with my family, which are spread out over several countries.

And after this it worked pretty well. The remote control was a pain in several ways, but that’s the fault of my chassis.

But the last two years it has been struggling. Compiz slows the video down so that HD video wouldn’t work, which meant I could not upgrade above Ubuntu 12.04 (because it has Unity 2D which doesn’t use Compiz). Also it seems Flash just get’s slower and slower and lately HD video has been jerky when watching Flash video. And 1.5 years ago I bought a new bigger TV, and there was no 1:1 pixel ratio resolution that both the TV and the PC would support, which again is annoying when watching Flash video in the browser.

And two weeks ago, the power went, and after that Ubuntu couldn’t find the on-board video any more. I tried some old video boards I had lying around, but none of them would work fine, and the computer started crashing randomly.

So, after pretty much exactly 7 years, this computer, who was not designed to be high-powered, but silent, finally gave up.

So I bought an ASUS P8H77-M LE board, because I have another P8H77 card in my main computer, and it’s pretty awesomely good. Unfortunately it does NOT have the right slots in the right position to work with the riser cards my chassis is using, an issue I had forgot about during these 7 years. That’s annoying and means only graphics cards can be used, and I don’t plan on doing that. It also means I have to replace by PCI WiFi card with a USB one. But I can live with that, and I knew this motherboard would work well both with Ubuntu and with my TV.

And I got the cheapest processor I could find that would fit, which was an Intel Core i3 3250 3,50 GHz, and new memory. 2GB would have been enough, but apparently you can’t buy that little memory any more, so I got 4GB.

And after throwing everything out (including the DVD player, which use ATA, which the new motherboard doesn’t even have, we are back in business! Now with Ubuntu 14.04, a square pixel resolution, and full HD flash! Oh, and the fans are actually speed controlled by the motherboard now, for extra silent running! It all cost 280 dollars, and it wouldn’t surprise me if this lasts for another 7 years!

Would a Python 2.8 help you port to Python 3?

Quite often somebody says that there should be a Python 2.8 to help in making a smoother transition to Python 3. I have for some time had the very evil tendency to then ask them what exactly this Python 2.8 should contain. It is very rare that the person making the claim can answer this question in any way except that it should contain “things” that make the transition to Python 3 “smoother”. (Or they want to back port almost everything from Python 3, but I covered that in my previous post on Python 2.8).

So in this blog post I will examine the possibilities to make the transition to Python 3 smoother through a new version of Python.

First of all, in all the code below, let us assume there is a constant defined, called PY3:

PY3 = sys.version_info[0] == 3

Let’s then look at the various types of differences between Python 2 and Python 3, and if it is possible to make a smoother transition between Python 2 and Python 3 by introducing a Python 2.8.

Built-in rename and moves

Several built in functions have been renamed or moved. Since as far back as there is documentation available, Python has had two range functions: range and xrange. There’s no reason to have two, so in Python 3 there is only one. It’s called range but behaves like xrange. If you actually want a list, you do list(range(x)) instead. This means that any code that assumes that range() will return a list will fail. It also means that if you under Python 2 uses range() you will get the old function, which can take time and use a lot of memory if the range is large.

Could there be a smoother transition there? Well, yes, there could have been a from __future__ import range in some version of Python that means that range now instead is xrange, and in the next version of Python the old range would have been gone, and xrange would have been renamed.

But do we need a smoother transition there? No. If you have to use xrange or some other renamed built-in you can just do this instead:

if not PY3:
    range = xrange
    from functools import reduce

The same goes for almost all the name changes in built-ins. If you use a lot of them you would end up with several lines of __future__ imports, or several lines of  range = xrange statements, so it would not be pretty in either case. I actually prefer the latter. The best/least ugly solution is instead to use the six module where you have a prettier syntax:

from six import range, reduce

So for the renaming of built in functions, using six provides a smoother, or at least prettier transition than a Python 2.8 would.

Conclusion: Python 2.8 would not help.

Dictionary iterators

Since Python 2.2 the methods on dictionaries have had iterator variants: iterkeys(), itervalues() and iteritems(). Here it is also hard to make a smoother transition. Perhaps Python 2.2 could instead of the iterxxx methods have had a from __future__ import dictionary_methods, but it’s not obvious how this should be done, as it needs to be implemented on a module level, but the methods are methods on the dictionary objects, meaning that one and the same dictionary object would have needed to behave differently in different modules. Not nice.

Do we need a smoother transition? No. Most usage of dictionaries is with
for x in tehdict: which will use the iterator in both Python 2 and Python 3. When you need to use iteritems() or itervalues() in Python 2, you can yet again use six which provides functions for this:

from six import iteritems
for x in iteritems(thedict):
    print(x)

So for the renaming of the iterator methods on dictionaries, a __future__ import would have been prettier to use, but horrible to implement. But it isn’t really necessary. The transition to Python 3 is not made easier through this, just a bit prettier. iterxxx() from six works fine, and is not ugly.

Conclusion: Python 2.8 would not help.

The standard library reorganization

Much of the Python 2 standard library was written before PEP 8, and so does not follow the PEP 8 rules. In addition, many modules could be merged, urllib and urllib2 is just the most obvious example. One way to handle this in code that needs to run both Python 2 and Python 3 is to simply try to import is from one place, and if that fails import it from the other place.

try:
    import configparser
except ImportError:
    import ConfigParser as configparser

Could this have been done in a smoother way? Well, for some libraries, perhaps. It would for example be possible to rename the libraries to a new name, and have an alias that prints a deprecation warning, etc.

Do we need a smoother transition? No. Again six comes to the rescue, as it has aliases for these modules, so that you can import them by one name in both Python 2 and Python 3:

from six.moves import configparser

Python 2.8 could provide a slightly smoother transition than using six, by providing deprecation warnings for the old names. You would also not have to make any changes once you drop Python 2 support completely. However, the library renaming is not a significant hurdle in supporting Python 3.

Conclusion: Python 2.8 would make the transition slightly smoother.

Note: This is one of the only three actual suggestions on what could go into a Python 2.8 that I have received.

Metaclasses

The metaclass syntax changed in Python 3. Can Python 2.8 help there? Yes, it could probably support both syntaxes. On the other hand, six also has a workaround for that, so it’s not really an issue.

Conclusion: Python 2.8 can provide a prettier solution that using six.

The Unicode change

The change that impacts most people and is hardest to handle is the changing of str. In Python 2 the string type contained a sequence of octets, while in Python 3 it contains a sequence of Unicode characters. This has numerous effects, to numerous to mention here.

Could this transition have been made in a smoother way? Well, yes, because Python 3.0 removed the u'' prefix to mark something as being Unicode. Python 2.6 introduced a from __future__ import unicode_literals to help easy the transition, but it turned out to be less useful that anticipated. I think the only way this transition could have been done in a “smooth” way is over a chain of several Python versions. That transition would have looked something like this:

* Python 2.6: Introduces a b” prefix that creates new bytes types that are 8-bit sequences, but not plain strings. The whole standard library will accept both bytes and strings.

* Python 2.7: Strings are deprecated. You are recommended to use either b'' bytes or u''-Unicode strings.

* Python 2.8: Strings are removed. You now only have b'' bytes or u''-Unicode strings.

* Python 2.9: The string literal is reintroduced, but it is now an alias for the Unicode literal.

* Python 3.0: The Unicode literal is renamed “string”.

That would have been a “smoother” transition. But as you see it would have taken five Python versions, and hence some ten years or so. And each version would have introduced new things. It would be easy to be compatible for three versions at once, but not more than so. Supporting Python 2.5 and 2.8 at the same time would have been practically impossible. So in that sense, this “smooth” transition might have in fact ended up even more disruptive than the path that was chosen. And I’m sure the rants would have been even angrier than they are now.

This is one of the major reasons that is was decided that making a gradual change was not the best solution, and that making one new version with all the major changes in one go was a better choice. On a side note: I still think this decision was correct, but the difficulty in writing code that straddled both Python 2 and Python 3 was overestimated. This led to the differences actually being larger than necessary, like for example removing u''.

Python 2.8 and Unicode

Can a Python 2.8 help to make a smoother transition? Well, it could perhaps make sure that the standard library accepts both bytes and strings where that makes sense, and both strings and Unicode where that makes sense. Otherwise 2.8 can’t make the transition smoother without breaking backwards compatibility. But there has been at least some work on doing this already, so I don’t know how much is left. The only module I know of the top of my head where this is a problem is the csv module, that require files opened in binary mode in one version of Python, but text mode in the other version.

Also, a Python 2.8 could introduce a warning when running in -3 mode whenever there is an implicit conversion between octet strings and Unicode strings. This would help to find potential problems and fix them before porting. However, Python 2.8 is not necessary for that, you can have it now if you want.

@wong_jim suggested actually breaking the implicit conversions on a per module basis with a future import. This would help find the problems in your code without being distracted by implicit conversions in the standard library. But again this requires that the objects behaves differently depending on which module they are in, which is not an ideal situation.

Conclusion: Python 2.8 would not help.

Note: More deprecations, or breaking implicit conversions are two of the only three actual suggestions on what could go into a Python 2.8 that I have received.

Byte-string behaviour

In Python 2, binary data is handled by strings. In Python 3, they are handled by byte objects. And they don’t behave exactly the same. Python 2.8 could introduce the bytes objects, making the difference smaller. That would make it easier to write code that works on both Python 2.8 and Python 3. But that would actually break a lot of existing code! A better solution (which is being discussed) is to change the behaviour of the bytes object to me more like a Python 2 string.

Conclusion: Python 2.8 would not help unless it breaks backwards compatibility.

Doctest problems

Many doctests will break, and break quite badly, when you move to Python 3. This is because doctests typically rely on comparing the representation of results. And the representation has often changed. It’s loads of small things that have changed, from the unicode string representation to the removal of the long type to that some builtin classes representation having changed in subtle ways.

The only solution to this is to rewrite the doctests to not depend on how objects representation looks. Python 2.8 can not help there.

Conclusion: Python 2.8 would not help.

Problematic API’s

In some cases your module might have API’s that doesn’t work in Python 3. I’ve encountered two:

1. zope.interface uses syntax that requires class body statements that manipulate the locals() to insert a metaclass statement. But in Python 3, the metaclass statement isn’t in the locals, so this doesn’t work. This means the API must be changed, and a fixer needed to be written to change the class body statements to class decorators.

2. icalendar would use __str__ as the API to marshall iCalendar objects to iCalendar data. As iCalendar data is an UTF-8 byte string, that obviously doesn’t work in Python 3. The conclusion here is that you shouldn’t misuse Python-internal API’s like that, and the end result was that icalendar needed to first grow a proper API, and then support for Python 3 could be added.

This is one of the hard parts of supporting Python 3, and Python 2.8 can’t do anything about this.

Conclusion: Python 2.8 would not help.

Other changes

The change in behaviour of the division operator has a __future__ import already. So does the print-function. Python 2.6 supports both the old and new version of the exception-syntax. The removal of the long-type is mostly a non-issue.

Conclusion: Python 2.8 would not help.

Final conclusion!

In my insistent asking for things that 2.8 could actually do to easy the transition to Python 3, I have received only three suggestions. In both cases alternatives exist that you can use today without a new version of Python 2. six generally provides a solution that is just as good, and in some cases also much less ugly, than Python 2.8 could. And if you have Unicode problems, then the unicodenazi module is useful while adding Python 3 support.

So would Python 2.8 help? No, not significantly. Python 2.7 already has the future compatibility that you need to write code that runs under both Python 2 and Python 3. You can start adding Python 3 support today. It’s not that difficult.

Did I miss anything?

I can’t think of any other changes that can be done in a backwards compatible way. But maybe I missed something. In that case, please tell me!

collective.portlet.content 1.8.1 released

This portlet shows a content item as content in the portlet.

Changes since 1.8 is:

  • Added German translation and moved portlet title and description to i18:domain plone to make them translateable in the add portlet dropdown, too [fRiSi]
  • Fixed #1: The defaults in the form and the defaults on the Assignment were different. [regebro]
  • Fixed #5: Added portlet id based class to the portlet tag. [regebro]
  • Fixed #8: You now only need View permission on the content, not on it’s parents. [regebro]

tzlocal 1.1.1 released

tzlocal is a module that will help you figure out your computers local timezone information under Unix (including OS X) and Win-32. It requires pytz, and returns pytz tzinfo objects.

This version adds better support for OS X, and it also adds a dictionary that maps time zone names from tzdata/Olsen names to Windows names, so it is useful to map to and from Windows time zones to tzdata timezones as well.

The potential for a Python 2.8.

There has recently been some discussion about Python 3, and the percieved slow rate of adoption, and as a part of that discussion there have been frequent calls for a Python 2.8. But nobobody agrees with anyone else about anything in that discussin, so I thought I would dissect the issue a bit to clarify.

Essentially there are two different views of Python 2.8. It’s common to hear people asking for a Python 2.8 to help transition to Python 3. Let’s call that a “transitional Python 2.8″. The other group that wants a Python 2.8 want it because they want all the new goodie features of Python 3.x, like yield from, without having to actually upgrade their existing code to Python 3. I’ll call that the “featuristic Python 2.8″.

The Featuristic Python 2.8

Python 3 is in active development, and do have new features added to it. Python 2 is in bugfix mode. Why is that? Well, it’s simply because the core developers aren’t interested in adding features to Python 2. In their opinion Python 3 is so much better and nicer that thay no longer want to maintain or add features to Python 2. In fact, Python 3 exists because there were features the core developers wanted that could not be added to Python in a fully backwards compatible way.

Therefore, from the core developers point of view, if we add the features they want to Python 2, you get Python 3. You ask for a Python 2 with the goodies from Python 3? OK, here, it’s called Python 3. So the core developers are not interested in creating a Python 2.8 that includes just some of the benefits of Python 3. That’s just a lot of work, but for them it creates no benefit. A Featuristic Python 2.8 is essentially just a crippled Python 3.4.

And that features are added to the latest version and earlier versions are in bug fix mode is after all the standard mode of development. To ease transition, both Python 2.6 and 2.7 has been released, but this can’t continue for ever, for obvious reasons.

Does that mean a Python 2.8 can not happen? No, it can. If the Python core developers decide that Python 3 was a dead end, then obviously a Python 2.8 will happen. But that is a big if, and it certainly isn’t going to happen anytime soon. The other way it can happen if somebody forks Python 2, and makes a Python 2.8. It will have to be released under another name, though, but should “Psnakes 2.8″ become a big success, this may also change the core developers minds.

But asking the core developers to make a Python 2.8 and release it is currently not a constructive path forward. They are not interested in doing it. And being upset and rude isn’t gonna help either. This is open source we are talking about, and people working on their free time. You can demand that they do something they don’t want to, and call them names if they don’t. It’s not constructive.

So, if you want “Psnakes 2.8″ what should you do? Well, stop complaining and do it. I’m sorry, but that’s how it is in open source. If nobody wants to do the job, it’s not going to get done.

The Transitional Python 2.8

The other major type of Python 2.8 being discussed is a Python 2.8 that helps smooth the transition to Python 3. Mainly this would be done by making it easier to write code that runs both on Python 2.8 and Python 3 (mainly Python 3.5, probably). Could that happen? Well, I think it will be much easier to convince the core devs to release this kind of Python 2.8, if the need is there. The question then is if the need really is there. What features can you add to a Python 2.8 so that it is still backwards compatible, makes it easier to write code that runs on both Python 2.8 and Python 3, but can not be added in a separate library?

Well, *that* is an interesting discussion, and I have asked this to many people over the last year(s). And usually nobody has any suggestions. The only suggestions I have recieved so far is:

1. Include the standard library reorganisation, so that libraries are available under both the new and old names.
2. More deprecation warnings.

Tools like six helps with the standard library reorganisation, so that doesn’t really require a Python 2.8. More deprecation warnings to require a Python 2.8, but I’m not sure how helpful that would be. After all, the best path to supporting Python 3 (at least in my experience) includes runninge 2to3 on the code, so having deprecation warnings for anything that is handled by 2to3 is probably not very important.

I can think of more things, but I’m not telling, because I want people who have tried to port to tell me what features they actually want.

So in my opinion, the potential for a Transitional Python 2.8 isn’t that large either, simply because it’s not really needed. Upgrading to Python 3 *can* be a lot of work. But I’m not sure a Python 2.8 would make it significantly easier. But I’m willing to be convinced otherwise.

25% holiday discounts on Porting to Python 3!

For the holiday season I’ve decided to have a 25% sale on my book on porting to Python 3!

You can get the PDF version at Gumroad and you get a 25% discount with the code “py3yule”.

The paperback is available from Createspace and you get the 25% discount with the code “MP4JLMLC”.

Excellent presents for any Python programmer!

Paperback of Porting to Python 3 available for sale!

The second edition of Porting to Python 3 is now available as paperback as well as PDF!

Porting to Python 3 is the most complete source available to help you with making your Python code run on Python 3.

It’s available as paperback from many channels:

It’s also available as PDF! When you buy the PDF you get three PDF’s, adapted for screen/print, tablets and phones.

The book gets right to the point, without a lot of fluff and filler – Doug Hellmann

A concise, well-organised and complete reference – Richard Jones

Pyroma 1.3.1 released

Update: After another bug report I realized I didn’t have a check that there are distributions uploaded to the Cheeseshop. This despite being a pet peeve of mine that people don’t upload distributions to the Cheeseshop. So I released 1.4, with an additional check for this.

Pyroma rhymes with aroma, and is a product aimed at giving a rating of how well a Python project complies with the best practices of the Python packaging ecosystem, primarily PyPI, pip, Distribute etc, as well as a list of issues that could be improved.

The aim of this is both to help people make a project that is nice and usable, but also to improve the quality of Python third-party software, making it easier and more enjoyable to use the vast array of available modules for Python.

Pyroma 1.3.1 fixes a small bug and improves the integration with zest.releaser for a smoother release experience.

Follow

Get every new post delivered to your Inbox.

Join 1,302 other followers