Porting setuptools to py3k

I recently spent some time porting setuptools to the py3k-struni branch as a means of testing both 2to3 specifically and the porting process generally. What follows are the notes from the experience. Two things to keep in mind: first, the struni branch, though slated to become the “official” Python 3000 branch, is still very much in flux and currently has 30+ failing tests; needless to say, it’s not an ideal porting target. Secondly, I was attempting this without Python 2.6’s forward compatibility mode, which is still mostly unwritten. As both of these situations change, I’ll keep trying to port more code to test the general readiness of the 2.x -> 3.x migration strategy.

Things to do in your Python 2 code:

  • Don’t write code like this:
    class install(_install):
      new_commands = [
        ('install_egg_info', lambda self: True),
        ('install_scripts',  lambda self: True),
      ]
      _nc = dict(new_commands)
      sub_commands = [
        cmd for cmd in _install.sub_commands if cmd[0] not in _nc
      ] + new_commands

    That won’t work in Python 3000 because of changes to list comprehensions and class definitions. Move the new_commands and _nc declarations out of the class body.

  • Don’t rely on implicit relative imports. In Python 3000, all imports will be absolute by default; you should write one of
    from setuptools.dist import _get_unpatched
    # or
    from .dist import _get_unpatched

    instead of

    from dist import _get_unpatched

Stuff that needs to be easier:

  • The fact that __cmp__ methods are going away sucks, plain and simple. This required me to manually implement four additional comparison methods for every class that had previously relied on __cmp__. I hope their removal will be rethought and retracted.
  • The struni branch currently has three different string-ish types: bytes, str (previously unicode) and str8. Guido has said that str8 will eventually go away, but its presence in unexpected places (like modules’ __file__ attributes) made for some needlessly frustrating debugging. Ignoring str8, the new bytes types is going to be a serious obstacle for anyone wanting to move their codebase to Python 3. Take the following two lines:
    data = open(some_file).read() # read in text mode
    # and
    data = open(some_file, "rb").read() # read in binary mode

    The first returns a str, the second a bytes object; these two types have incompatible APIs, and the current state of the struni branch makes it impossible to write code that operates on both. For example, the signatures of the types’ split() methods are different, and the bytes type lacks a splitlines() method. These aren’t hypothetical differences: I’ve run into both problems while trying to fix several of the tests in the standard library.

  • On the subject of bytes, I ran into two additional bytes/str incompatibilities when porting setuptools. First, when you iterate over str instances, you get single-character strs back; when you iterate over bytes instances, you get integers. Combine this with code that switches based on type, and you end up banging your head against the table when your code starts kicking out errors, complaining that Python can’t iterate over the number 91. Secondly, I am absolutely sick of seeing “cannot concatenate str and bytes types” errors; my general tactic is to start throwing str() calls around until the error goes away, but that kind of shotgun debugging hurts my soul.

This needs to be easier. I hope Guido will release any notes he’s been taking while porting the standard library to use the new bytes and str types.

On the plus side, setuptools helped turn up a few bugs in 2to3, as well as some places where the translation could have been improved (and has been). I intend to repeat this experiment once the struni branch settles down and once 2.6’s py3k-compat mode works.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.