Skip to content

59% of maintained packages support Python 3

I ran some statistics on PyPI:

  • 50377 packages in total,
  • 35293 unmaintained packages,
  • 15084 maintained packages.

Of the maintained packages:

  • 5907 has no Python classifiers,
  • 3679 support only Python 2,
  • 1188 support only Python 3,
  • 4310 support Python 2 and Python 3.

This means:

  • A total of 5498 packages support Python 3,
  • 36% of all maintained packages declares that they support Python 3,
  • 24% of all maintained packages declares that they do NOT support Python 3,
  • and 39% does not declare any Python support at all.

So: The 59% of maintained packages that declare what version they support, support Python 3.

And if you wonder: “Maintained” means at least one versions released this year (with files uploaded to PyPI) *or* at least 3 versions released the last three years.

Developers need configuration management too!

SCM tools: devs, ops or devops?

This is a somewhat disjointed brain dump on the topic. It may not always follow a logical readable path.

Software Configuration Management tools are originally aimed at operations people. They are system admin tools, created to make it possible to deploy a specific set of software in a consistent manner on several machines. The first such system I saw was in the mid-90’s and would replace the Windows 3 program manager. You would see a Microsoft Word icon, but when you clicked on it you did not start Word, instead you started a program that would check if you had Word installed, and that it was the correct version, etc. If not, it would install Word while you went for coffee, and when you came back Word would be running.

Devops has also started using SCM systems the last few years, as it allows you to quickly and easily deploy the last version of your web application onto your servers.

But developers have generally not used configuration management to set up their development environment. In the Python web community there is Buildout, most popular amongst Plone people and also some Django people. Buildout is very similar to a Software Configuration Management system, but it’s not designed to be one, so  it lacks the modules for generic operating system tasks. It’s used to set up development environments and deploy websites.

So maybe developers don’t need SCM systems? Well, as the title already tells you, I disagree. And I will explain why by giving some examples.

Example 1: TripleO’s devtest.sh

In my new job at Red Hat, working with TripleO there has been a constant frustration in getting a development environment up and running. This is partly because OpenStack is a complex system made up of many separate parts and just installing this is complex in itself. But it is also partly because the “canonical” way of getting a TripleO development environment running is to run a script called “devtest.sh”. And that doesn’t sound bad, but so far I have only been able to run it successfully once. And since it easily can take an hour or two to fail, trying to run it is a frustrating exercise in patience. And I am a very impatient person, so I fail.

The basic problem with the devtest.sh script is that it is a script. It knows how to install things. To make sure it doesn’t try to install something that isn’t already installed, it first uninstalls it. So if the script fails in the end of the setup, re-running it means deleting a lot of what was done, and then doing it again. Often it failed because a website was down, or because my DNS server didn’t answer correctly or fast enough. And each error would kill the whole process and require me to start over.

You can also run it step by step, but when doing that it is often unclear if the step finished correctly or not. So I only managed to finish it properly after I got the recommendation to first run it in a mode to build all images, and then run the script in a mode that only did the configuration and did not try to build the images. Even so, I  had to set up a caching proxy and a local DNS server to avoid network problems, so the image-building could finish. It’s also worth mentioning that I don’t have network problems, really. Only devtest.sh would claim it couldn’t reach servers or look up names. I don’t know why it’s so brittle.

I should note that last week TripleO became installable with Instack, so the situation has reportedly improved, but I haven’t tried yet, because I’m afraid of touching what is now finally a working situation. But this serves as a problem of how bad it can be. It took me three months, and probably a week or two of actual work to get devtest.sh running. It would most likely have been faster to run each step manually, but the only description of how to do that was in the massive collection of scripts that goes by the name devtest.sh. And following what happens in those scripts, containing a lot of global environment variables, is a lot of work. I know, because I tried.

Example 2: Zope/Plone and Buildout

In the typical project I would be involved in when I worked at Nuxeo, we usually had several Zope servers, a ZEO (ZODB) server, some other database, a Lucene server for searching, and a load balancer. And you needed most of these things to be able to develop on the project. Getting started on a project was typically a matter of following instructions in one or several README files, often inaccurate or outdated. Setting up a project so you could start working on it took in average around a day.

Then Buildout arrived. Buildout worked so that you defined up each of the services you wanted, and then you just ran bin/buildout and it would set up the environment. With Buildout setting up an environment to get started all it took was a few command line commands and a coffee break. Or, in the case of really complex projects, a lunch break. OpenStack is not an order of magnitude more complex than a typical Plone setup, yet it is an order of magnitude (or two) easier to get a development environment up and running with Plone than with OpenStack. I think that’s a good indication that Plone is on the right path here.

Buildout is somewhat limited in scope, it’s a tool designed largely for Python and it also helped isolate the project from the rest of the world, so you could have several projects on your computer at the same time with different versions of Python modules. It therefore has special handling for Virtualenv, as it also does environment isolation for Python, and Setuptools, which it requires and does many magic things with. But when developing Python software, and especially Python software for web, it’s an amazing tool.

It allows you to install specific versions of software and modules (and easily change which version). But storing the Buildout configuration in a version management system you can also tag the configuration when you deploy it at a customer. That way you can also easily and quickly replicate the customer deployment which helps you replicate issues.

It also has grown a very useful toolset. As en example, mr.developer allows you to make a configuration in such a way that if you suddenly need to develop on a specific module, you can quickly switch from the released version to using a trunk checkout. This means that you can choose which parts of the project that should be in “development mode”. Having a big project like Plone itself in development mode makes your environment to unstable. Somebody will often check in a change in one module that breaks another module, and should you happen to update your checkouts then your whole environment is broken, even though that change did not directly affect you. You want most of your environment to run stable versions, and only run the development trunk of the modules you are directly working on. Mr.developer allows you to do that.

Example 3: TripleO’s “diskimage-builder”

For creating images to use in virtual machines, TripleO has diskimage-builder. It’s also similar to an SCM system as it actually creates virtual machines, and than installs software and configures these machines. It does this based on “elements”, so you can define what elements you want to have installed on your machine-image.

A bad thing with it is that it’s script based. Each element is a script, meaning that overriding and extending them is tricky. It also means that you may have problems in mixing and matching elements, as they might step in it’s does, by for example each providing their own configuration file for a software. I don’t think this is a big problem for diskimage-builder, because it’s use case is installing OpenStack and Red Hat’s OpenStack products. The risk of different elements stepping on each other is therefore small. And it’s always used to create images from scratch, so the start-environment is know, it’s always a cleanly installed new operating system. This makes the whole configuration issue simpler.

But it also has several good ideas. The first of these are the elements itself. They define up how to install and configure a specific piece of the puzzle. This can be the operating system, or a software, like for example glance. And what is also nice is that the elements may have definitions for different kinds of installs. The glance element for example contains both installs from packages and from source.

So what do we developers really need?

We need a generic Software Configuration Management tool that can install all the bits that are needed for a development environment.

Configuration driven

Instead of script that are doing things in a specific order, no matter what, a good system would instead just have a description of the state we want to achieve, and make it so. This is both for reliability and speed.
Buildout, as mentioned in my previous blog post on SCM’s, will in fact not re-run a successfully configured part unless the configuration for it changed (or told to explicitly).

This means that each attempt of running the script does not become a 4-hour monster run that will fail after 3 hours because of a temporary network error. If there is a temporary error, we just run the tool again, and it picks up more or less where it left off. Changing a configuration would only re-run the parts affected by that change.

A module repository

Since a module written in C has completely different ways of building than one in Ruby or Python, and also different languages have different ways of declaring dependencies (if any at all) there needs to be support for package repositories that declare these things.

A repository would need to declare how the module is called in different types of Unices, so it can be installed both by apt, yum, nix or the other package managers for Linux. You also need to declare where the source code for different versions can be downloaded, and how to compile it, for those systems that do not have a package. Most likely we would want Buildout style “recipes” to handle common cases, like a recipe that can download code and run “configure, make, make install” on it, and another recipe for installing Python modules, and another for Ruby modules etc.

Version pinning

You want to be able to declare the exact versions to use, and you want this to be done in a separate file. This is so that you can pin versions for deployment so that you can later easily replicate your customers deployment when looking at fixing bug reports.

Multiple version installs

As a developer, we want to be able to have several versions installed at the same time. This is easy with some software, and hard with other software. A developer centric SCM probably needs to integrate Nix and Docker to be able to isolate some things from the operating system. See also Domen Kozar’s blog on the topic.

Stable or development mode per module

A standard development build should probably install the latest released version of everything by default. In an OpenStack context that would mean that you install whatever comes with your operating system for the most cases. On Ubuntu it would install OpenStack with apt-get and on Fedora with yum.

But then you should be able to say that you want to get a specific piece, for example the Nova service, in development mode, and re-running the configuration after that change would remove or stop the OS-installed Nova, checkout Nova master, and build it, but probably not start it, as you would likely want to run it from a terminal so you could debug it.

When changing a module to deployment mode, the version pinning should of course be ignored. And if the modules checkout declares a different version than what is pinned, you should get an error.

Online and offline modes

You don’t always want your machines to reach out to the internet to install. That means that there needs to be a way to package both the configuration and all dependent files in a package that can be transferred to the target computer for installing.

What we probably do not want

We don’t want complicated ways to run installation over SSH. I don’t know why almost every SCM system seems to implement it’s own version of this. It’s probably better that the SCM system installs things locally, and that you use a separate tool to deploy different configurations to different machines. Fabric seems to be a good tool for this.

Does this already exist?

I don’t know. You tell me.

Fedora 20 experiences

Since I now work for RedHat, I think I should try to use Fedora for my desktop. This is a somewhat disorganized log of these efforts. I’ve now been using it for three months.

Installation

Installation is pretty, but somewhat confusing. Setting up the partitions was confusing, but I figured it out in the end, and also found the button the “prefill” the custom partitioning with the Fedora defaults, which are pretty reasonable (but don’t allow hibernation).

Something I did miss was Ubuntu’s keyboard layout detector, which is pretty nifty.

Gnome 3

Fedora 20 by default uses Gnome 3 and Gnome Shell. This has one huge advantage compared with Ubuntu’s Unity, it doesn’t use Compiz, which has crappy drivers on older hardware. Compiz is the reason I stayed on Ubuntu 12.04 up to now, because it includes Unity 2D, a Unity implementation that doesn’t use Compiz. Both have a search box that shows up when you press the Windows key, although Ubuntu’s is more advanced, and you can in Ubuntu search specifically for available but not installed installed appslications, documents, music or video.

In Gnome Shell the search box is not a separate thing, but a part of the “activities overview”. The overview will also “zoom out” and show all open windows. This is practical, but I suspect that Gnome Shell might not be very fun to use on older computers partly because of this.

Something that is annoying for power users like me, and surely completely confusing for everyone else, is that the settings for mouse and trackpad sensitivity requires you to restart Gnome with <alt>-<F2> ‘restart’. You could logout too, if there was a logout option anywhere, which there isn’t. It only shows up if you have more than one user on the system, unless you apply a workaround.

The workspace handling is nifty, with a dynamic number of workspaces depending on how many you actually use. It’s a bit hard to find the workspace handling though, you click “Activities” and move the mouse to the right edge of the screen. There’s probably some keyboard shortcuts as well, but I don’t use workspaces much.

Some of the issues I had with Gnome 3/Gnome Shell deserves to be discussed in more detail.

Notification area

Gnome Shell has a notification area at the bottom. This area is annoying, it pops up when you don’t want it, and does not pop up when you want it (although I think I’m starting to get the hang of making it show when I want, at least). Notifications are also generally showed a very short time, generally so you don’t have time to read them, and then for some reason it’s not saved on the hidden notification area.

The notifications that are shown on this area are actually application icons, like Skype, XChat, the software updater, etc. However, there is no indication that these notifications appear in this notification area, so you simply don’t see them, unless you open the notification bar by mistake! That makes this notification area quite useless, as it actually doesn’t show the notifications!

In addition to that, some of the notifications, like for example the music player, will show up bigger versions and gain keyboard focus if you happen to have a mouse pointed at the area where the notification pops up. Which, being near the bottom of the screen, is a somewhat common place to have the mouse.
As a result you can be happily coding away when your editor suddenly loses keyboard focus, and instead your keyboard actions will start doing things like stopping and starting music or switch songs. Wut!?

Pretty much all of this notification area is highly bizarre.

Sidebar

Gnome Shell has a dock-style icon menu called the “Dash”. I like docks, but I want them to be visible all the time. Modern wide screens have way to much left-right space anyway. Unfortunately there is no setting to always show the dock, because it’s actually a part of the “Activities overview”, and you have to move the mouse to the top-left corner and either click, or wait a bit to activate this. This slows down usage a lot. The solution for this is to install an extension called “Dash to Dock“. The Sidebar is then independent of the Activities Overview, and has a setting to show all the time.

System monitor

I find it extremely useful to be able to see what my computer is busy doing at a glance. For this I need a system monitor in the top bar. Luckily there is one for Gnome Shell, called System Monitor Applet. The published extension doesn’t currently work with Fedora 20, but the github master does. That Gnome Shell applets are so unstable that you have to install from git is probably just an indication of Gnome Shell being somewhat immature compared to Unity, I remember Gnome Shell being so buggy it was unusable when I tried it last, which must have been 2012. But this is a problem that will disappear with time.

Updating

The application for updating software is not great. It doesn’t display what is to be updated in a format that gives you a good overview, and it also seems impossible to find a changelog for the packages.

What is worse, it insists on rebooting and updating during the reboot. This is amazingly annoying, as most updates can be done without rebooting. This of course assuming that you get told there are updates at all, which you usually don’t, so you have to actually remember to check manually.

I’ve ended up using yum to do the actual updates. However, yum seems to never tell me to reboot, so that makes no sense either.

Ubuntu’s update manager will update, and then ask to reboot the computer if needed (which generally is only if the kernel has been updated). If Firefox has been updated it will tell you to restart the browser. This is soo much better than how Fedora does it.

Tweaking the UI

Two apps are absolutely essential to make Gnome 3 usable. gnome-tweak-tool and dconf-editor. Tweak Tool can be used to change a lot of things, and the first thing you want to do is to go into the Fonts section and set a scaling factor. This is because Gnome 3 defaults to assuming your screen has 96 pixels per inch. That was possibly a useful default in the 90’s, but today your typical laptop screen will have closer to 160 pixels per inch, making all text infinitesimal. The worst with this is that desktop monitors will have much less DPI. My monitor and my laptop have the same resolution, but differ greatly in size. That means that you have to choose between to small text on the laptop or too large text on the screen.

Unfortunately browsers will ignore this setting, so you need to fix the settings separately for browsers. Why this is I do not know. It feels to me like these issues were fixed years ago, I don’t know why they show up again. For Firefox the solution is an add-on called “NoSquint“. I haven’t looked for a solution for Chromium yet.

Windows now for some strange reason have no minimize button.The lack of maximize buttons is a lesser problem, as you can easily maximize a window by double-clicking the title bar. But for minimizing, you need a minimize window. Fair enough, once you have installed “Dash to Dock” the need to minimize anything is, well, minimized, but I still like getting Windows out of the way. Luckily you can add both minimize and maximize buttons with the Tweak Tool.

I also needed to go into dconf-editor to turn off the screenshot function, which creates a screenshot every time you press the PrtSc button. Because it is on my laptop located just by the AltGr button, so I press it all the time by mistake and that got annoying very quickly.

Python

I always compile my own Python for development. This is because I don’t want them all in the path, I want to control which Python I run, and I want to let the OS do whatever it wants to do with the OS provided Python. Ubuntu has a neat package called “build-essentials”, which installs the things you absolutely need to have for development. Fedora does not have this, so you need to know what packages these are. So instead run:

sudo yum install make automake gcc gcc-c++ kernel-devel

Then Python also needs a lot of libraries and headers:

sudo yum install zlib-devel readline-devel ncurses-devel openssl-devel \
   gdbm-devel libsqlite3x-devel bzip2-devel tk-devel

Weirdly enough even though I installed lzma-devel, Python did not seem to find this and complained that it couldn’t compile _lzma. I found no solution to this. Python also did not want to compile the Berkeley DB modules, and this seems to be because Fedora 20 does not have Berkeley DB 5.3, but only Berkeley DB 4. None of these are problems for me, so I didn’t look very long for solutions, but yet this is curious to me that these things doesn’t work.

I also get loads and loads of warnings when compiling that I haven’t noticed on Ubuntu. I will have to double check this, perhaps it’s because it’s a 64-bit compile.

Other software

Banshee, xchat and KeePassX are all in the standard Fedora repositories. Chromium and OpenNX requires adding custom repositories, something which is commendably easy. I don’t remember how I installed Flash, and I could not find a version of Google Earth that worked. Flash fullscreen didn’t work, which seems to be a Gnome 3 issue, but there is a workaround.

Bitorrent Sync doesn’t have a distribution for the GUI package, which means you have to install the standard version which only has a web UI.

I have used Banshee as a music player for years now, but on Fedora 20 it crashes when trying to import the music library. I the used Rhytmbox for a while, but then it started crashing after each song. Now I use Quod Libet, which so far is very good.

Other issues

I have 4 virtual CPU’s. They are always doing stuff. Mostly waiting for IO (I have an SSD, it’s fast, what are these processes doing?). The worst offender is usually polkitd, which gets better if you restart it. Using Google Hangout will eat up all the processing power I have. I have no idea why.

When I log in to Fedora it asks me for my Google password. My gmail password is umptydumpteen random characters, and I don’t want to type it in, I want to copy it in from my password manager, which I can’t access, because the password prompt blocks out the whole desktop. Annoying. When I then finally decided that I would actually copy out my long Google password, it turns out it STILL doesn’t work. Possibly because of two-phase login. I have no idea how to make Fedora stop asking me for the password.

And the labels that show up over icons and applets often gets stuck and never dissapear. Yeah, I know it’s not a big thing, but it’s annoying!

Conclusion

SO ANNOYING!

It required too much tweaking and a lot of search engine use to get it to work in a reasonable way. You have to tweak quite a bit to get Gnome Shell acceptable, but it’s possible to do it. But even after that, the notification handling is beyond stupid, and really, really annoying. And you have to remember to check for software updates regularly. It’s also clear that Fedora has less support for third–party apps, at least desktop apps.

I miss Ubuntu.

You gotta keep ‘em separated

“On the usage of Unicode strings and binary strings in dynamic languages”

Music: Offspring
Lyrics: Some idiot

You gotta keep ‘em separated

Like the latest fashion
Like a spreading disease
The kids are runnin’ on their way to the classroom
Getting Python 3 with the greatest of ease

The folks stake their own campus locale
And if they catch you slippin’ then it’s all over pal
If one guy’s strings and the other’s don’t mix
The decoding will fail or the encoding will fail

Hey! You sending text to me?
Encode to Bytes!
You gotta keep ‘em separated
Hey! You getting text from me?
Decode to Strings!
You gotta keep ‘em separated
Hey! Pay no mind
If you use Unicode it will be easy all the time
He-ey, use Python 3!

By the time you concatenate
It’s already too late
If you didn’t decode your bytes
You gonna get a Encode/Decode fail

It’s not gonna change in Python 4
No one’s getting smarter no one’s learning the score
Your never-ending spree of implicit conversions
Is gonna tie your own rope, tie your own rope, tie your own

Hey! You sending text to me?
Encode to Bytes!
You gotta keep ‘em separated
Hey! You getting text from me?
Decode to Strings!
You gotta keep ‘em separated
Hey! Pay no mind
If you use Unicode it will be easy all the time
He-ey, use Python 3!

It’s not gonna change in Python 4
No one’s getting smarter no one’s learning the score
Your never-ending spree of implicit conversions
Is gonna tie your own rope, tie your own rope, tie your own

Hey! You sending text to me?
Encode to Bytes!
You gotta keep ‘em separated
Hey! You getting text from me?
Decode to Strings!
You gotta keep ‘em separated
Hey! Pay no mind
If you use Unicode it will be easy all the time
He-ey, use Python 3!

A script is not configuration.

I’ve been looking into Ansible lately, and have had some problems in explaining what I think is wrong with Ansible, so this blog post is an attempt to do that, by comparing it with Buildout. This may seem a bit strange, since they don’t really do the same thing, but I think it will make sense in the end. So hang in there.

Ansible

Ansible calls itself “software automation”, and this is correct, but it’s often presented as an SCM system, but in my opinion it is not. And the reason for this is that Ansible is based around writing scripts. Because these scripts are written in YAML, they superficially look like configuration but that is misleading. Ansible calls these YAML files “playbooks” which again indicates what you do with them: You play them. They have a start and finish, and then perform a set of actions, in the order that the actions are defined in the playbook. The actions are often of the form “ensure that software X is installed” or “make sure that the file Y contains the line Z”. But it doesn’t change the fact that they are actions performed in the order written in the files. Therefore these files are scripts, and not configuration. Configuration does not have a specific order, configuration you first parse, and then you access the configuration data arbitrarily. This is not what Ansible playbooks does.

And that’s fine. It’s not criticism against Ansible per se. Ansible mainly calls itself a system for automation. And it is, just like any sort of script files. Bash scripts are also designed for automation. Ansible just has a very complicated (as opposed to complex) script language. Thanks to the choice of YAML the language is very restricted and often tricky to use as you end up having to fight the syntax by a very liberal and exact application of quote characters.

However, they do state on their website that “Ansible features an state-driven resource model that describes the desired state of computer systems and services, not the paths to get them to this state. ” And that is really only partly true. If you only use modules that check for state, it is true for a reasonable definition of “True”. But a lot of the modules shipped with Ansible doesn’t do that. And more importantly, the state is not defined in configuration, the state is defined in a script. This leads to limitations, which we will talk about later.

You can add new commands to Ansible by writing modules. They can be written in any language, which sounds like a neat idea, but it means that the API for a module is passing in JSON data on stdin an returning it on stdout. This makes debugging painful, and it means writing new modules a pain. In addition to that, to write Python modules you have to add a specific line at the end of the file with a “star import”, that breaks PEP8 and also confuses some syntax aware Python editors.

Ansible also recommends a specific directory layout, with several folders, who all have a file called main.yml. That means your editor quickly ends up having a whole lot of main.yml open, and that gets confusing. My good friend Jörgen Modin called that kind of layout “A conspiracy on the hard disk” in reference to using Zope Toolkit style programming which does the same with it’s configure.zcml files. A file name should reflect what is in it, not what role it plays (unless that role is unique within one project).

For SCM you also need to have several layers of orthogonal modularity. You need to be able to define up the installation of for example MySQL, and then you need to define up what software should go onto each machine. Ansible can do this, although confusingly most people tend to use what Ansible calls “roles” to define up one component, and then you use the groups in the inventory file as roles. But that’s just naming, you’ll get used to that.

Buildout

Buildout calls itself a “software build system” and that’s not incorrect, but it makes it sound like it’s competing with make and scons, and it does not. In my opinion, Buildout is closer to being a Software Configuration Management system than a build system. I would call it a Environment Configuration System as it’s mainly designed to set up development environments, although it can also be used to deploy software. It’s main shortfall to being a proper SCM is that it lacks modules to do common SCM tasks, such as installing system packages with yum and apt, and more problematically, it lacks support for running some bits, like for example yum and apt, as a superuser.

Buildout does not have any support for remote deployment, so you need to use a separate program like Fabric to run Buildout remotely.

Just like Ansible has modules you can use to create new commands, Buildout has recipes. I Ansible they can be written in any language, in Buildout they have to be written in Python. This perhaps lessens the appear to some people, but I do think the benefits are worth it. A Buildout recipe is just a Python module, like any other, and they can be made available on the Python Cheese Shop, in which case Buildout will download and install them when you run it.

Configuration, not scripting

The most important thing about Buildout for the purposes of this blog is that Buildout is configured entirely with configuration files, more specifically of the ConfigParser variety. INI-files in general have the benefit of being designed for configuration, and it’s extremely minimalist syntax means it never gets in the way. Buildout of course has to extend the syntax by allowing variable substitution, but that is also all it does. Everything written in the configuration is also a variable and can be used, so you only need to define one piece of information once.

It also means that a part of a Buildout configuration only needs to be run once, if it succeeds. It then has set up the configuration correctly, and subsequent runs can skip the parts that has succeeded, unless the configuration changes. It also means that it is, at least in theory, possible to write uninstallers, as you can record the state before the run.

The choice of INI-style syntax also means there is no inherent execution order to the configuration. The configuration instead is split up into what Buildout calls “parts”, each part executed by a recipe given in the part. Here are two examples of parts. The first one will download, compile and install nginx locally (in the buildout directory). The second will generate an nginx configuration file from a template.

[nginx]
recipe = zc.recipe.cmmi
url = http://html-xslt.googlecode.com/files/nginx-0.7.67-html-xslt-4.tar.gz
# The SSI bug was fixed in nginx-0.7.65-html-xslt-2.tar.gz
extra_options =
    --conf-path=${buildout:directory}/etc/nginx.conf
    --sbin-path=${buildout:directory}/bin
    --error-log-path=${buildout:directory}/var/log/nginx-error.log
    --http-log-path=${buildout:directory}/var/log/nginx-access.log
    --pid-path=${buildout:directory}/var/nginx.pid
    --lock-path=${buildout:directory}/var/nginx.lock

[nginx.conf]
recipe = collective.recipe.template
port = 8000
root = ${buildout:directory}/examples
input = ${buildout:directory}/templates/nginx.conf.in
output = ${buildout:directory}/etc/nginx.conf

What makes this configuration and not a script is that none of this is executed unless the part is listed in a separate configuration:

[buildout]
parts = nginx
        nginx.conf

Buildout configuration files can also extend other files. So if we save the above in a file called base.cfg, we can then create another configuration file:

[buildout]
extends = base.cfg

[nginx.conf]
port = 8080

The only difference between the base.cfg and this file is that nginx will run on another port. This makes it easy for me to checkout a development environment and then make my own configuration file that just overrides the bits I need to change. Because it’s all configuration. With Ansible I would have to make the port into a variable and pass a new value in when I run Ansible. And that means that when writing Ansible playbooks, to make them proper configuration management and not scripts, everything must be a variable. Buildout avoids that issue by not having the intermediary step of Playbooks, but just having recipes, and configuration.

With buildout your configuration can also extend another configuration file, and add, skip or insert parts as you like.

[buildout]
extends = base.cfg
parts = nginx.config
        loadbalancer
        nginx

[loadbalancer]
...

Each part remains the same, but the order is different and there is a new one in the middle. In general, because all configuration is parsed and gathered before the parts are run, it doesn’t matter much which order you run them in, but in some cases they of course do. If you are going to not just configure software, but also start it, obviously you have to install and configure it first, to take an obvious example.

There is also a special syntax for adding and removing values from a configuration like the parts-definition:

[buildout]
parts += newpart1 newpart2
develop -= foo.bar

A Buildout example

Buildout is often used to set up both development, staging and production environments. I have an example where I have a base.cfg that only installs the Plone CMS. I then have a production.cfg which also sets up load balancing and caching, a supervisord to run the services, a cronjob to start the services on reboot, and cronjobs to do backups and database maintenance. My staging.cfg extends the production configuration only to change the ports, so that I can run the staging server on the same machine as the production server. The development.cfg also just extends base.cfg, so you don’t get any of the production services, but it instead adds loads of development tools. Lastly there is a version.cfg which contains version numbers for everything to be installed, so you know that if you set up a local environment to test a problem, you are using the same software as production.

If you were aiming to deploy this onto several servers, and have the database on one server and caching and loadbalancing on one, and the CMS instances on separate servers, then you would make a configuration file per server-type, and use that.

Buildout extensions

Buildout has a final level of indirection, it has extensions. Examples of extensions that are available are buildout.dumppickedversions (although it’s now a part of Buildout itself) that would list all Python packages that you has not given a specific version number for. Another is called mr.developer, which gives you commands to make sure that the Python packages that you are working on are all synced to the versioning system. It can even allow you to switch between the latest release and a checked out development version of a Python package, which is really neat.

Perhaps it would possible to make an extension which will allow you to run some Buildout parts as root and other as a normal used, and I’m willing to give implementing it a try, but I’m a bit busy at the moment, so it will have to wait. And if you can’t write an extension like that, adding that feature should be relatively easy. And with that feature, I would be prepared to call Buildout a Software Configuration Management system. It may be originally developed to manage only development environments, but it has proven itself capable of much more, and it certainly has done the most important design decisions in SCM correct. Hopefully the above text will clarify why.

Thoughts on JavaScript frameworks.

Like it or not, JavaScript is the language that we have to use for front end development on the web. And that was fine when all we did was adding some dynamism to the site, perhaps a menu or a date picker. But now the whole web application is often using JavaScript, and whole bits of the page is replaced via Ajax, and everything is clickable and updates automatically, etc. And at that point just having everything in one big JavaScript file is not feasible. And splitting it up randomly into separate files hardly helps as you tend to end up with loads of globals anyway.

To write a full front end JavaScript app you need some sort of structure. The first step there was JavaScript libraries, where jQuery the last few years have pretty much taken over everything. Lately the new trend is frameworks. This can intuitively make sense, as the web is moving from adding a bit of JavaScript here and there, to writing the applications purely in JavaScript. And just as with libraries, there are infinite amounts of them, and it will probably take a few years until we can shake out some winners. Because with frameworks it’s likely to be more than one winner, as the requirements for frameworks are quite different for different people and different use cases.

My requirements on a JavaScript framework may be somewhat different than others, but this list is based on my experience so far, which admittedly has been neither extensive, not good. But these are things I’ve found lacking.

Library-ishness

The first part of a good JS framework is that it is usable as a library, or that at least a lot of parts of the framework can be used in a simple way without using all of the framework. This is required for many reasons. Firstly this helps you get started with the framework as you can easily use parts of it without drowning.

Today many web applications are written in back-end frameworks, like Ruby on Rails or Django, and then applying a JavaScript framework whose main usage is to write a web application, you end up having two competing frameworks that fight each other. Therefore a framework that is designed only to be used to write full applications in JavaScript is going to be a bad fit for a web application written in Django. This requirement is quite unique for JavaScript Application frameworks, as you are unlikely to start using a bit of Ruby-on-Rails in a J2EE app, for example.

This means that if the framework is not a good fit for slowly taking over existing applications, you end up not using it in existing applications, and then  you need to learn some other library or framework in that situation. That means you need to know at least two frameworks for doing the same thing, which is a waste of brain power. One framework should be enough.

Modularity

To be able to handle big applications you need to be able to make things modular, and you need a way to handle module dependencies. This is something JavaScript traditionally has been very bad at, and there are several efforts to do something about that. A framework needs to support this, and you need to be able to declare your dependencies. You should also preferably have some sort of name space handling so that not all modules that are required for your application is available everywhere, as that clutters the name space. Think about Python’s import statement here, for an example of what you want.

Something you also want is an ability to declare your dependencies after the application has loaded. This is so that you can create a pluggable application. This is again mainly a requirement for applications that run on a traditional server-side framework. These frameworks often have their own system for making things extensible and pluggable. So you might want to write a plugin for your server-side framework that uses a date input with a nice date selector, and then you need that plugin to be able to say to the JavaScript application that is needs a specific module. Ways of doing that that are not good enough includes having to edit a separate file that lists all your JavaScript dependencies, because that file may not be under your control, as it’s a part of the web application, and not a part of the plugin you are developing.

Legacy form handling

Related to this is that you want to be able to integrate the JavaScript frameworks form handling with the back-end applications form handling. Some JavaScript frameworks assume that you are submitting any form with JavaScript collecting the values from a form, and then making a POST in that JavaScript. But if your widgets assume this, they may not work in a normal form submit, because they may not have a real <input> tag to submit the value. That also makes it harder to use just bits of the framework. Either that, or the framework needs to take over any <form> tag that uses a framework widget by magic. But magic is usually a bad idea.

HTML is declarative, your JavaScript App should also be.

Lately a new attitude has shown up in JavaScript, and that’s using a more declarative style. Instead of having an initializing script that tags specific <input> tags as using a date-time widget, you set some sort of declaration on the tag, and the widget happens automatically. Although this is just a small step from having just a special class for this, and using jQuery to set all inputs with that class to use the widget, it’s a change in attitude. And it’s an important change, because if you in a JavaScript framework need to write a lot of JavaScript manipulation of the HTML you have broken much of the purpose and principle of HTML. HTML is declarative, and you should be able to think about it in that way, and not have to think about it as something that in the future needs to be modified.

So with declarative JS, instead of creating HTML and then writing code that manipulates this HTML you write some sort of self-contained modules that are applied automatically. Of course, in the end it’s still JavaScript that manipulates the DOM, but it’s now done in localized and contained ways. And if you think this is obvious, then you are probably writing your declarative JavaScript in that way already, so pat yourself on the back!

The less JavaScript the better

Quite often, both for libraries and frameworks, you end up writing a lot of JavaScript. This sounds reasonable, but it should not be. Not only because writing and developing JavaScript sucks, but also because the JavaScript and HTML lives in a very tight integration, and the HTML is often designed by somebody who often do not know JavaScript. So less JavaScript is necessary not only to save us poor Python programmers from the pain, but also to make it possible for designers and usability people to create designs and mockups.

This also ties into the declarative development above. In the ideal world a HTML developer should be able to add a bunch of JavaScripts to his HTML, and then just write HTML code to get most of the web application UI working. (S)he should be able to declare a list of tabs, and have it behave like a list of tabs, and the date-time widget should be declared only with HTML, and the framework should do the magic.

Example: Patterns

Patterns” is a library of JavaScript widgets and functionality that works entirely from a declarative style. It aims and to a large part succeeds to make it possible for HTML hackers to design a webapp. This is because the concept in fact comes from a designer, Cornelis Kolbach. With Patterns you for example make a HTML injection like this:

<a href="demos/frobber.html#content" class="pat-inject">Demo the frobber</a>
<section id="content">
  HTML to be replaced.
</section>

The “pat-inject” class means the <a> link will load the href, find the tag with the ID “content” and inject it in the current page at the content-id. All of the usage of Patterns is similarly free of Javascript, while the patterns themselves is of course written in JavaScript.

Using Patterns is very simple, and you can freely pick and choose what you use, because it is in fact not a framework at all, it’s a library. It’s also badly documented and setting up an environment where you can develop patterns was at least a few months ago exceedingly complicated. But it serves well as an example of what you can achieve with declarative JavaScript libraries.

Example: AngularJS

AngularJS is a framework, and it’s very frameworky. It also claims to apply a declarative attitude, and indeed, the “Hello World” example of AngularJS requires you only to create the main application module. But then it quickly breaks down.

If you want to have a pre-filled value in the input when you load the page, commonly done when you are editing something, then the examples tell you to do this with very imperative JavaScript. This seems very strange, especially considering that most inputs have a value attribute, but that’s just ignored. AngularJS also breaks pretty most of the requirements I have for a good framework, and I admit it, this whole post was prompted by me trying to use AngularJS. Trying to use a tags-widget written with AngularJS in a form led to a lot of pain, partly because AngularJS does assume you will submit the form with AngularJS, as explained above.

AndularJS is trying to be modular, and you can indeed write new declarative modules, by adding directives. But using them independently can be quite tricky, and the first problem is that you define AngularJS dependencies in the same line of code that you create the application. That’s equivalent to having to declare all your imports in the first lines of your Python application! There is no way to add a dependency once the main application has been created.

That means that you actually have to make your own system for declaring dependencies where your plugins can add their dependencies to a list to be used when initializing the main AngularJS module.That adds a layer of complexity and brittleness to your web application. In Horizon, the admin UI for Openstack, there are now at least two different ways of declaring your plugins AngularJS requiements, perhaps more. This adds “technical debt” and complexity for no good reason.

AngularJS is definitely not a framework where you can use bits and pieces, you have to drink the cool-aid. And this seems to be the general consensus; AngularJS is good for writing apps from scratch, but not for adding functionality to existing applications.

Conclusion

The conclusion is that writing frameworks is hard, and for JavaScript possibly even harder. I have over the weekend gotten several recommendations for other JavaScript frameworks and for me none of them seem to fit the bill. I look forward to being proven wrong.

Correction and further thoughts

I had been pointed to Polymer, but I saw it as a widget library, not a framework, and after reading more documentation, I still do. It looks similar to Patterns, but with better documentation, and widget animations. I hope to get a chance to use it. It may very well be that the perfect framework is a combination of Polymer as a widget library and one or several bits that provides only frameworky bits for when writing pure JS frontends.

You need some sort of module encapsulation, and you need to be able to declare dependencies and add dependencies in a modular way. Either adding them at any point in the app initialization, or you need multi-step initialization, so you first create the app, then add dependencies, then initialize it.

I suspect that you also might need some sort of model/view/template mechanism. Obviel could fit the bill there.

Interlude: My HTPC is dead, long live my HTPC!

7 years ago I started this blog. It started with a couple of posts on my experiences with making a Home Theatre Personal Computer, running Linux, because I’m crazy.

Once I gave up on trying to use it to watch TV, and bought a TV, things were better. (Disclaimer, XBMC has better TV support now, so maybe it doesn’t suck anymore). Once I gave up on having it play DVD’s and bought a Blue-Ray player, it was even better. I now use XBMC to play music and videos, I use a web-browser to watch football live from streaming websites. And I use it for Skype with my family, which are spread out over several countries.

And after this it worked pretty well. The remote control was a pain in several ways, but that’s the fault of my chassis.

But the last two years it has been struggling. Compiz slows the video down so that HD video wouldn’t work, which meant I could not upgrade above Ubuntu 12.04 (because it has Unity 2D which doesn’t use Compiz). Also it seems Flash just get’s slower and slower and lately HD video has been jerky when watching Flash video. And 1.5 years ago I bought a new bigger TV, and there was no 1:1 pixel ratio resolution that both the TV and the PC would support, which again is annoying when watching Flash video in the browser.

And two weeks ago, the power went, and after that Ubuntu couldn’t find the on-board video any more. I tried some old video boards I had lying around, but none of them would work fine, and the computer started crashing randomly.

So, after pretty much exactly 7 years, this computer, who was not designed to be high-powered, but silent, finally gave up.

So I bought an ASUS P8H77-M LE board, because I have another P8H77 card in my main computer, and it’s pretty awesomely good. Unfortunately it does NOT have the right slots in the right position to work with the riser cards my chassis is using, an issue I had forgot about during these 7 years. That’s annoying and means only graphics cards can be used, and I don’t plan on doing that. It also means I have to replace by PCI WiFi card with a USB one. But I can live with that, and I knew this motherboard would work well both with Ubuntu and with my TV.

And I got the cheapest processor I could find that would fit, which was an Intel Core i3 3250 3,50 GHz, and new memory. 2GB would have been enough, but apparently you can’t buy that little memory any more, so I got 4GB.

And after throwing everything out (including the DVD player, which use ATA, which the new motherboard doesn’t even have, we are back in business! Now with Ubuntu 14.04, a square pixel resolution, and full HD flash! Oh, and the fans are actually speed controlled by the motherboard now, for extra silent running! It all cost 280 dollars, and it wouldn’t surprise me if this lasts for another 7 years!

Would a Python 2.8 help you port to Python 3?

Quite often somebody says that there should be a Python 2.8 to help in making a smoother transition to Python 3. I have for some time had the very evil tendency to then ask them what exactly this Python 2.8 should contain. It is very rare that the person making the claim can answer this question in any way except that it should contain “things” that make the transition to Python 3 “smoother”. (Or they want to back port almost everything from Python 3, but I covered that in my previous post on Python 2.8).

So in this blog post I will examine the possibilities to make the transition to Python 3 smoother through a new version of Python.

First of all, in all the code below, let us assume there is a constant defined, called PY3:

PY3 = sys.version_info[0] == 3

Let’s then look at the various types of differences between Python 2 and Python 3, and if it is possible to make a smoother transition between Python 2 and Python 3 by introducing a Python 2.8.

Built-in rename and moves

Several built in functions have been renamed or moved. Since as far back as there is documentation available, Python has had two range functions: range and xrange. There’s no reason to have two, so in Python 3 there is only one. It’s called range but behaves like xrange. If you actually want a list, you do list(range(x)) instead. This means that any code that assumes that range() will return a list will fail. It also means that if you under Python 2 uses range() you will get the old function, which can take time and use a lot of memory if the range is large.

Could there be a smoother transition there? Well, yes, there could have been a from __future__ import range in some version of Python that means that range now instead is xrange, and in the next version of Python the old range would have been gone, and xrange would have been renamed.

But do we need a smoother transition there? No. If you have to use xrange or some other renamed built-in you can just do this instead:

if not PY3:
    range = xrange
    from functools import reduce

The same goes for almost all the name changes in built-ins. If you use a lot of them you would end up with several lines of __future__ imports, or several lines of  range = xrange statements, so it would not be pretty in either case. I actually prefer the latter. The best/least ugly solution is instead to use the six module where you have a prettier syntax:

from six import range, reduce

So for the renaming of built in functions, using six provides a smoother, or at least prettier transition than a Python 2.8 would.

Conclusion: Python 2.8 would not help.

Dictionary iterators

Since Python 2.2 the methods on dictionaries have had iterator variants: iterkeys(), itervalues() and iteritems(). Here it is also hard to make a smoother transition. Perhaps Python 2.2 could instead of the iterxxx methods have had a from __future__ import dictionary_methods, but it’s not obvious how this should be done, as it needs to be implemented on a module level, but the methods are methods on the dictionary objects, meaning that one and the same dictionary object would have needed to behave differently in different modules. Not nice.

Do we need a smoother transition? No. Most usage of dictionaries is with
for x in tehdict: which will use the iterator in both Python 2 and Python 3. When you need to use iteritems() or itervalues() in Python 2, you can yet again use six which provides functions for this:

from six import iteritems
for x in iteritems(thedict):
    print(x)

So for the renaming of the iterator methods on dictionaries, a __future__ import would have been prettier to use, but horrible to implement. But it isn’t really necessary. The transition to Python 3 is not made easier through this, just a bit prettier. iterxxx() from six works fine, and is not ugly.

Conclusion: Python 2.8 would not help.

The standard library reorganization

Much of the Python 2 standard library was written before PEP 8, and so does not follow the PEP 8 rules. In addition, many modules could be merged, urllib and urllib2 is just the most obvious example. One way to handle this in code that needs to run both Python 2 and Python 3 is to simply try to import is from one place, and if that fails import it from the other place.

try:
    import configparser
except ImportError:
    import ConfigParser as configparser

Could this have been done in a smoother way? Well, for some libraries, perhaps. It would for example be possible to rename the libraries to a new name, and have an alias that prints a deprecation warning, etc.

Do we need a smoother transition? No. Again six comes to the rescue, as it has aliases for these modules, so that you can import them by one name in both Python 2 and Python 3:

from six.moves import configparser

Python 2.8 could provide a slightly smoother transition than using six, by providing deprecation warnings for the old names. You would also not have to make any changes once you drop Python 2 support completely. However, the library renaming is not a significant hurdle in supporting Python 3.

Conclusion: Python 2.8 would make the transition slightly smoother.

Note: This is one of the only three actual suggestions on what could go into a Python 2.8 that I have received.

Metaclasses

The metaclass syntax changed in Python 3. Can Python 2.8 help there? Yes, it could probably support both syntaxes. On the other hand, six also has a workaround for that, so it’s not really an issue.

Conclusion: Python 2.8 can provide a prettier solution that using six.

The Unicode change

The change that impacts most people and is hardest to handle is the changing of str. In Python 2 the string type contained a sequence of octets, while in Python 3 it contains a sequence of Unicode characters. This has numerous effects, to numerous to mention here.

Could this transition have been made in a smoother way? Well, yes, because Python 3.0 removed the u'' prefix to mark something as being Unicode. Python 2.6 introduced a from __future__ import unicode_literals to help easy the transition, but it turned out to be less useful that anticipated. I think the only way this transition could have been done in a “smooth” way is over a chain of several Python versions. That transition would have looked something like this:

* Python 2.6: Introduces a b” prefix that creates new bytes types that are 8-bit sequences, but not plain strings. The whole standard library will accept both bytes and strings.

* Python 2.7: Strings are deprecated. You are recommended to use either b'' bytes or u''-Unicode strings.

* Python 2.8: Strings are removed. You now only have b'' bytes or u''-Unicode strings.

* Python 2.9: The string literal is reintroduced, but it is now an alias for the Unicode literal.

* Python 3.0: The Unicode literal is renamed “string”.

That would have been a “smoother” transition. But as you see it would have taken five Python versions, and hence some ten years or so. And each version would have introduced new things. It would be easy to be compatible for three versions at once, but not more than so. Supporting Python 2.5 and 2.8 at the same time would have been practically impossible. So in that sense, this “smooth” transition might have in fact ended up even more disruptive than the path that was chosen. And I’m sure the rants would have been even angrier than they are now.

This is one of the major reasons that is was decided that making a gradual change was not the best solution, and that making one new version with all the major changes in one go was a better choice. On a side note: I still think this decision was correct, but the difficulty in writing code that straddled both Python 2 and Python 3 was overestimated. This led to the differences actually being larger than necessary, like for example removing u''.

Python 2.8 and Unicode

Can a Python 2.8 help to make a smoother transition? Well, it could perhaps make sure that the standard library accepts both bytes and strings where that makes sense, and both strings and Unicode where that makes sense. Otherwise 2.8 can’t make the transition smoother without breaking backwards compatibility. But there has been at least some work on doing this already, so I don’t know how much is left. The only module I know of the top of my head where this is a problem is the csv module, that require files opened in binary mode in one version of Python, but text mode in the other version.

Also, a Python 2.8 could introduce a warning when running in -3 mode whenever there is an implicit conversion between octet strings and Unicode strings. This would help to find potential problems and fix them before porting. However, Python 2.8 is not necessary for that, you can have it now if you want.

@wong_jim suggested actually breaking the implicit conversions on a per module basis with a future import. This would help find the problems in your code without being distracted by implicit conversions in the standard library. But again this requires that the objects behaves differently depending on which module they are in, which is not an ideal situation.

Conclusion: Python 2.8 would not help.

Note: More deprecations, or breaking implicit conversions are two of the only three actual suggestions on what could go into a Python 2.8 that I have received.

Byte-string behaviour

In Python 2, binary data is handled by strings. In Python 3, they are handled by byte objects. And they don’t behave exactly the same. Python 2.8 could introduce the bytes objects, making the difference smaller. That would make it easier to write code that works on both Python 2.8 and Python 3. But that would actually break a lot of existing code! A better solution (which is being discussed) is to change the behaviour of the bytes object to me more like a Python 2 string.

Conclusion: Python 2.8 would not help unless it breaks backwards compatibility.

Doctest problems

Many doctests will break, and break quite badly, when you move to Python 3. This is because doctests typically rely on comparing the representation of results. And the representation has often changed. It’s loads of small things that have changed, from the unicode string representation to the removal of the long type to that some builtin classes representation having changed in subtle ways.

The only solution to this is to rewrite the doctests to not depend on how objects representation looks. Python 2.8 can not help there.

Conclusion: Python 2.8 would not help.

Problematic API’s

In some cases your module might have API’s that doesn’t work in Python 3. I’ve encountered two:

1. zope.interface uses syntax that requires class body statements that manipulate the locals() to insert a metaclass statement. But in Python 3, the metaclass statement isn’t in the locals, so this doesn’t work. This means the API must be changed, and a fixer needed to be written to change the class body statements to class decorators.

2. icalendar would use __str__ as the API to marshall iCalendar objects to iCalendar data. As iCalendar data is an UTF-8 byte string, that obviously doesn’t work in Python 3. The conclusion here is that you shouldn’t misuse Python-internal API’s like that, and the end result was that icalendar needed to first grow a proper API, and then support for Python 3 could be added.

This is one of the hard parts of supporting Python 3, and Python 2.8 can’t do anything about this.

Conclusion: Python 2.8 would not help.

Other changes

The change in behaviour of the division operator has a __future__ import already. So does the print-function. Python 2.6 supports both the old and new version of the exception-syntax. The removal of the long-type is mostly a non-issue.

Conclusion: Python 2.8 would not help.

Final conclusion!

In my insistent asking for things that 2.8 could actually do to easy the transition to Python 3, I have received only three suggestions. In both cases alternatives exist that you can use today without a new version of Python 2. six generally provides a solution that is just as good, and in some cases also much less ugly, than Python 2.8 could. And if you have Unicode problems, then the unicodenazi module is useful while adding Python 3 support.

So would Python 2.8 help? No, not significantly. Python 2.7 already has the future compatibility that you need to write code that runs under both Python 2 and Python 3. You can start adding Python 3 support today. It’s not that difficult.

Did I miss anything?

I can’t think of any other changes that can be done in a backwards compatible way. But maybe I missed something. In that case, please tell me!

collective.portlet.content 1.8.1 released

This portlet shows a content item as content in the portlet.

Changes since 1.8 is:

  • Added German translation and moved portlet title and description to i18:domain plone to make them translateable in the add portlet dropdown, too [fRiSi]
  • Fixed #1: The defaults in the form and the defaults on the Assignment were different. [regebro]
  • Fixed #5: Added portlet id based class to the portlet tag. [regebro]
  • Fixed #8: You now only need View permission on the content, not on it’s parents. [regebro]

tzlocal 1.1.1 released

tzlocal is a module that will help you figure out your computers local timezone information under Unix (including OS X) and Win-32. It requires pytz, and returns pytz tzinfo objects.

This version adds better support for OS X, and it also adds a dictionary that maps time zone names from tzdata/Olsen names to Windows names, so it is useful to map to and from Windows time zones to tzdata timezones as well.

Follow

Get every new post delivered to your Inbox.

Join 1,339 other followers