Skip to content

Would a Python 2.8 help you port to Python 3?

June 3, 2014

Quite often somebody says that there should be a Python 2.8 to help in making a smoother transition to Python 3. I have for some time had the very evil tendency to then ask them what exactly this Python 2.8 should contain. It is very rare that the person making the claim can answer this question in any way except that it should contain “things” that make the transition to Python 3 “smoother”. (Or they want to back port almost everything from Python 3, but I covered that in my previous post on Python 2.8).

So in this blog post I will examine the possibilities to make the transition to Python 3 smoother through a new version of Python.

First of all, in all the code below, let us assume there is a constant defined, called PY3:

PY3 = sys.version_info[0] == 3

Let’s then look at the various types of differences between Python 2 and Python 3, and if it is possible to make a smoother transition between Python 2 and Python 3 by introducing a Python 2.8.

Built-in rename and moves

Several built in functions have been renamed or moved. Since as far back as there is documentation available, Python has had two range functions: range and xrange. There’s no reason to have two, so in Python 3 there is only one. It’s called range but behaves like xrange. If you actually want a list, you do list(range(x)) instead. This means that any code that assumes that range() will return a list will fail. It also means that if you under Python 2 uses range() you will get the old function, which can take time and use a lot of memory if the range is large.

Could there be a smoother transition there? Well, yes, there could have been a from __future__ import range in some version of Python that means that range now instead is xrange, and in the next version of Python the old range would have been gone, and xrange would have been renamed.

But do we need a smoother transition there? No. If you have to use xrange or some other renamed built-in you can just do this instead:

if not PY3:
    range = xrange
    from functools import reduce

The same goes for almost all the name changes in built-ins. If you use a lot of them you would end up with several lines of __future__ imports, or several lines of  range = xrange statements, so it would not be pretty in either case. I actually prefer the latter. The best/least ugly solution is instead to use the six module where you have a prettier syntax:

from six import range, reduce

So for the renaming of built in functions, using six provides a smoother, or at least prettier transition than a Python 2.8 would.

Conclusion: Python 2.8 would not help.

Dictionary iterators

Since Python 2.2 the methods on dictionaries have had iterator variants: iterkeys(), itervalues() and iteritems(). Here it is also hard to make a smoother transition. Perhaps Python 2.2 could instead of the iterxxx methods have had a from __future__ import dictionary_methods, but it’s not obvious how this should be done, as it needs to be implemented on a module level, but the methods are methods on the dictionary objects, meaning that one and the same dictionary object would have needed to behave differently in different modules. Not nice.

Do we need a smoother transition? No. Most usage of dictionaries is with
for x in tehdict: which will use the iterator in both Python 2 and Python 3. When you need to use iteritems() or itervalues() in Python 2, you can yet again use six which provides functions for this:

from six import iteritems
for x in iteritems(thedict):
    print(x)

So for the renaming of the iterator methods on dictionaries, a __future__ import would have been prettier to use, but horrible to implement. But it isn’t really necessary. The transition to Python 3 is not made easier through this, just a bit prettier. iterxxx() from six works fine, and is not ugly.

Conclusion: Python 2.8 would not help.

The standard library reorganization

Much of the Python 2 standard library was written before PEP 8, and so does not follow the PEP 8 rules. In addition, many modules could be merged, urllib and urllib2 is just the most obvious example. One way to handle this in code that needs to run both Python 2 and Python 3 is to simply try to import is from one place, and if that fails import it from the other place.

try:
    import configparser
except ImportError:
    import ConfigParser as configparser

Could this have been done in a smoother way? Well, for some libraries, perhaps. It would for example be possible to rename the libraries to a new name, and have an alias that prints a deprecation warning, etc.

Do we need a smoother transition? No. Again six comes to the rescue, as it has aliases for these modules, so that you can import them by one name in both Python 2 and Python 3:

from six.moves import configparser

Python 2.8 could provide a slightly smoother transition than using six, by providing deprecation warnings for the old names. You would also not have to make any changes once you drop Python 2 support completely. However, the library renaming is not a significant hurdle in supporting Python 3.

Conclusion: Python 2.8 would make the transition slightly smoother.

Note: This is one of the only three actual suggestions on what could go into a Python 2.8 that I have received.

Metaclasses

The metaclass syntax changed in Python 3. Can Python 2.8 help there? Yes, it could probably support both syntaxes. On the other hand, six also has a workaround for that, so it’s not really an issue.

Conclusion: Python 2.8 can provide a prettier solution that using six.

The Unicode change

The change that impacts most people and is hardest to handle is the changing of str. In Python 2 the string type contained a sequence of octets, while in Python 3 it contains a sequence of Unicode characters. This has numerous effects, to numerous to mention here.

Could this transition have been made in a smoother way? Well, yes, because Python 3.0 removed the u'' prefix to mark something as being Unicode. Python 2.6 introduced a from __future__ import unicode_literals to help easy the transition, but it turned out to be less useful that anticipated. I think the only way this transition could have been done in a “smooth” way is over a chain of several Python versions. That transition would have looked something like this:

* Python 2.6: Introduces a b” prefix that creates new bytes types that are 8-bit sequences, but not plain strings. The whole standard library will accept both bytes and strings.

* Python 2.7: Strings are deprecated. You are recommended to use either b'' bytes or u''-Unicode strings.

* Python 2.8: Strings are removed. You now only have b'' bytes or u''-Unicode strings.

* Python 2.9: The string literal is reintroduced, but it is now an alias for the Unicode literal.

* Python 3.0: The Unicode literal is renamed “string”.

That would have been a “smoother” transition. But as you see it would have taken five Python versions, and hence some ten years or so. And each version would have introduced new things. It would be easy to be compatible for three versions at once, but not more than so. Supporting Python 2.5 and 2.8 at the same time would have been practically impossible. So in that sense, this “smooth” transition might have in fact ended up even more disruptive than the path that was chosen. And I’m sure the rants would have been even angrier than they are now.

This is one of the major reasons that is was decided that making a gradual change was not the best solution, and that making one new version with all the major changes in one go was a better choice. On a side note: I still think this decision was correct, but the difficulty in writing code that straddled both Python 2 and Python 3 was overestimated. This led to the differences actually being larger than necessary, like for example removing u''.

Python 2.8 and Unicode

Can a Python 2.8 help to make a smoother transition? Well, it could perhaps make sure that the standard library accepts both bytes and strings where that makes sense, and both strings and Unicode where that makes sense. Otherwise 2.8 can’t make the transition smoother without breaking backwards compatibility. But there has been at least some work on doing this already, so I don’t know how much is left. The only module I know of the top of my head where this is a problem is the csv module, that require files opened in binary mode in one version of Python, but text mode in the other version.

Also, a Python 2.8 could introduce a warning when running in -3 mode whenever there is an implicit conversion between octet strings and Unicode strings. This would help to find potential problems and fix them before porting. However, Python 2.8 is not necessary for that, you can have it now if you want.

@wong_jim suggested actually breaking the implicit conversions on a per module basis with a future import. This would help find the problems in your code without being distracted by implicit conversions in the standard library. But again this requires that the objects behaves differently depending on which module they are in, which is not an ideal situation.

Conclusion: Python 2.8 would not help.

Note: More deprecations, or breaking implicit conversions are two of the only three actual suggestions on what could go into a Python 2.8 that I have received.

Byte-string behaviour

In Python 2, binary data is handled by strings. In Python 3, they are handled by byte objects. And they don’t behave exactly the same. Python 2.8 could introduce the bytes objects, making the difference smaller. That would make it easier to write code that works on both Python 2.8 and Python 3. But that would actually break a lot of existing code! A better solution (which is being discussed) is to change the behaviour of the bytes object to me more like a Python 2 string.

Conclusion: Python 2.8 would not help unless it breaks backwards compatibility.

Doctest problems

Many doctests will break, and break quite badly, when you move to Python 3. This is because doctests typically rely on comparing the representation of results. And the representation has often changed. It’s loads of small things that have changed, from the unicode string representation to the removal of the long type to that some builtin classes representation having changed in subtle ways.

The only solution to this is to rewrite the doctests to not depend on how objects representation looks. Python 2.8 can not help there.

Conclusion: Python 2.8 would not help.

Problematic API’s

In some cases your module might have API’s that doesn’t work in Python 3. I’ve encountered two:

1. zope.interface uses syntax that requires class body statements that manipulate the locals() to insert a metaclass statement. But in Python 3, the metaclass statement isn’t in the locals, so this doesn’t work. This means the API must be changed, and a fixer needed to be written to change the class body statements to class decorators.

2. icalendar would use __str__ as the API to marshall iCalendar objects to iCalendar data. As iCalendar data is an UTF-8 byte string, that obviously doesn’t work in Python 3. The conclusion here is that you shouldn’t misuse Python-internal API’s like that, and the end result was that icalendar needed to first grow a proper API, and then support for Python 3 could be added.

This is one of the hard parts of supporting Python 3, and Python 2.8 can’t do anything about this.

Conclusion: Python 2.8 would not help.

Other changes

The change in behaviour of the division operator has a __future__ import already. So does the print-function. Python 2.6 supports both the old and new version of the exception-syntax. The removal of the long-type is mostly a non-issue.

Conclusion: Python 2.8 would not help.

Final conclusion!

In my insistent asking for things that 2.8 could actually do to easy the transition to Python 3, I have received only three suggestions. In both cases alternatives exist that you can use today without a new version of Python 2. six generally provides a solution that is just as good, and in some cases also much less ugly, than Python 2.8 could. And if you have Unicode problems, then the unicodenazi module is useful while adding Python 3 support.

So would Python 2.8 help? No, not significantly. Python 2.7 already has the future compatibility that you need to write code that runs under both Python 2 and Python 3. You can start adding Python 3 support today. It’s not that difficult.

Did I miss anything?

I can’t think of any other changes that can be done in a backwards compatible way. But maybe I missed something. In that case, please tell me!

About these ads

From → python

Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 1,285 other followers

%d bloggers like this: