python 3000


I recently spent some time porting setuptools to the py3k-struni branch as a means of testing both 2to3 specifically and the porting process generally. What follows are the notes from the experience. Two things to keep in mind: first, the struni branch, though slated to become the “official” Python 3000 branch, is still very much in flux and currently has 30+ failing tests; needless to say, it’s not an ideal porting target. Secondly, I was attempting this without Python 2.6’s forward compatibility mode, which is still mostly unwritten. As both of these situations change, I’ll keep trying to port more code to test the general readiness of the 2.x -> 3.x migration strategy.

Things to do in your Python 2 code:

  • Don’t write code like this:

    class install(_install):
      new_commands = [
        ('install_egg_info', lambda self: True),
        ('install_scripts',  lambda self: True),
      ]
      _nc = dict(new_commands)
      sub_commands = [
        cmd for cmd in _install.sub_commands if cmd[0] not in _nc
      ] + new_commands

    That won’t work in Python 3000 because of changes to list comprehensions and class definitions. Move the new_commands and _nc declarations out of the class body.

  • Don’t rely on implicit relative imports. In Python 3000, all imports will be absolute by default; you should write one of

    from setuptools.dist import _get_unpatched
    # or
    from .dist import _get_unpatched

    instead of

    from dist import _get_unpatched

Stuff that needs to be easier:

  • The fact that __cmp__ methods are going away sucks, plain and simple. This required me to manually implement four additional comparison methods for every class that had previously relied on __cmp__. I hope their removal will be rethought and retracted.

  • The struni branch currently has three different string-ish types: bytes, str (previously unicode) and str8. Guido has said that str8 will eventually go away, but its presence in unexpected places (like modules’ __file__ attributes) made for some needlessly frustrating debugging. Ignoring str8, the new bytes types is going to be a serious obstacle for anyone wanting to move their codebase to Python 3. Take the following two lines:

    data = open(some_file).read() # read in text mode
    # and
    data = open(some_file, "rb").read() # read in binary mode

    The first returns a str, the second a bytes object; these two types have incompatible APIs, and the current state of the struni branch makes it impossible to write code that operates on both. For example, the signatures of the types’ split() methods are different, and the bytes type lacks a splitlines() method. These aren’t hypothetical differences: I’ve run into both problems while trying to fix several of the tests in the standard library.

  • On the subject of bytes, I ran into two additional bytes/str incompatibilities when porting setuptools. First, when you iterate over str instances, you get single-character strs back; when you iterate over bytes instances, you get integers. Combine this with code that switches based on type, and you end up banging your head against the table when your code starts kicking out errors, complaining that Python can’t iterate over the number 91. Secondly, I am absolutely sick of seeing “cannot concatenate str and bytes types” errors; my general tactic is to start throwing str() calls around until the error goes away, but that kind of shotgun debugging hurts my soul.

This needs to be easier. I hope Guido will release any notes he’s been taking while porting the standard library to use the new bytes and str types.

On the plus side, setuptools helped turn up a few bugs in 2to3, as well as some places where the translation could have been improved (and has been). I intend to repeat this experiment once the struni branch settles down and once 2.6’s py3k-compat mode works.

I’m getting pretty sick of seeing blog posts and mailing lists threads endlessly bemoaning that, “the core developers…are causing a huge risk to the Python community by splitting it asunder for a period of years“. Gloom, doom, pox and peril, blah blah blah.

The language has two choices: either continue to bear the burden of what are now considered poor design decisions (e.g., four forms of raise, syntax ambiguities in except statements) or suck it up and let us try and fix some of these problems. It’s like going to the dentist: it may hurt, but if that minor toothache goes untreated and develops into an abscess, you will wish you were dead.

There are two parts to the transition plan: syntactic transition and semantic transition. For syntactic transition, Guido and I have sunk a lot of time into 2to3, which will translate your Python 2.x code into 3.0’s freshly-polished syntax. When it comes to adjusting your code’s semantics, Python 2.6 will feature a Python 3000 compatibility mode, which when enabled will warn you when you do something that will need to be changed before moving to 3.0. Are these tools perfect? No; that’s the price you pay for using a language as flexible as Python. Are they pretty damn good? Yes. Combined, 2to3 and Python 2.6 will make the vast majority of 2.x -> 3.0 transitions as painless as we can make them. For that last little remnant, the code we simply cannot deal with, that’s what your test suites are for. I have absolutely no pity for anyone trying to migrate to Python 3 without a test suite; you’re doing something fundamentally stupid and we will not bend over backwards to save your dumb ass.

As for the observation that pugs, the Perl 6 compiler, will be able to handle Perl 5 source as input and why oh why can’t Python do that, too: Perl 5-on-Perl 6 is a neat trick born in an intersection of necessity and opportunity. The necessity is there because Perl 6 is a fundamentally different language than Perl 5 (or at least it was the last time I looked; they may have changed their minds over the last week), and Perl’s DWIM mentality would make it prohibitively difficult to mechanically translate the old to the new. Also, the Perl 6 compiler can afford to have a Perl 5 runtime built in because there’s only one (serious) Perl 6 compiler, and so the developer and maintenance cost for this extra runtime is isolated within a single project.

Python simply can’t do that. There are four credible implementations of Python I know of (CPython, Jython, IronPython, PyPy), and we can’t ask each one of these efforts to please please won’t you embed a Python 2 runtime in your system? Not going to happen, ever. Given these circumstances, the best we can do is to have a syntax translator that will work across all implementations, and a semantics checker that’s spec-driven and as implementation-agnostic as possible.

If you think you can do better, show us the code. Talk is cheap.

  • PEP 3129, “Class Decorators”. This has already been accepted and implemented, thanks to Jack Diedrich.

  • PEP 3133, “Introducing Roles”. Roles are a competing idea to PEP 3119’s Abstract Base Classes.

Direct any discussion to python-3000.

There’s…too many of them!

Following Guido’s announcement/reminder that all Python 3000-related PEPs* have to be in by the end of April, PEPs have been coming out of the woodwork:

  • PEP 3119 - Guido’s abstract base classes PEP.

  • PEP 3120 - Using UTF-8 as the default source encoding

  • PEP 3121 - More flexible module Initialization and finalization.

  • PEP 3122 - Change how the “main” module is delineated. (This PEP has already been rejected.)

  • PEP 3141 - A proposal for a hierarchy of numeric base classes, based on PEP 3119.

There’re also several pre-PEPs being kicked around in the mailing lists:

*: PEPs impacting the stdlib don’t have to meet this deadline.

The two PEPs I talked about in the last two entries have just gone live. The PEPs (3109, 3110) are more formal and more comprehensive versions of the blog entries, so if you read those posts, you’ll have the basic ideas.

Following up on my last post about catching exceptions in Python 3, here are some excerpts from the companion PEP I’m working on, which addresses “raise” statements.

There are simply too many forms to the raise statement in Python 2. Quoting from the reference manual:

If no expressions are present, raise re-raises the last exception that was active in the current scope…

Otherwise, raise evaluates the expressions to get three objects, using None as the value of omitted expressions. The first two objects are used to determine the type and value of the exception.

If the first object is an instance, the type of the exception is the class of the instance, the instance itself is the value, and the second object must be None.

If the first object is a class, it becomes the type of the exception. The second object is used to determine the exception value: If it is an instance of the class, the instance becomes the exception value. If the second object is a tuple, it is used as the argument list for the class constructor; if it is None, an empty argument list is used, and any other object is treated as a single argument to the constructor. The instance so created by calling the constructor is used as the exception value.

If a third object is present and not None, it must be a traceback object…and it is substituted instead of the current location as the place where the exception occurred… The three-expression form of raise is useful to re-raise an exception transparently in an except clause, but raise with no expressions should be preferred if the exception to be re-raised was the most recently active exception in the current scope.

That’s pretty complex, and it doesn’t even address string exceptions. Until I started digging around in the interpreter internals, I didn’t even know the three-object form was possible. Here’s what raise will look like in Python 3:

  1. raise (with no arguments) is used to re-raise the active exception in an except block.

  2. raise EXCEPTION is used to raise a new exception. This form has two sub-variants: EXCEPTION may be either an instance of BaseException or a subclass of BaseException (follows from PEP 352). If EXCEPTION is a subclass, it will be called with no arguments to obtain an exception instance.

    To raise anything else is an error.

“But wait! That doesn’t allow me to supply a traceback!”. Never fear, PEP 344 is here. It specifies that exceptions will grow a __traceback__ attribute, and this is how we’ll be able to raise exceptions with arbitrary tracebacks. What looked like this in Python 2

raise Type, Value, Traceback

will look like this in Python 3

e = Type(Value)
e.__traceback__ = Traceback
raise e

Or possibly this (per a suggestion from Guido):

raise Type(Value).set_traceback(Traceback)

I’m also relying on PEP 344 to replace Python 2’s raise Type, Instance variant. This is most often used to “cast” an exception instance from one type to another, such as this example from distutils.bcppcompiler:

try:
    self.spawn (['brcc32', '-fo', obj, src])
except DistutilsExecError, msg:
    raise CompileError, msg

PEP 344 introduces a raise ... from ... statement and a corresponding __cause__ attribute. Taking advantage of these new tools, the above Python 2 snippet translates to

try:
    self.spawn (['brcc32', '-fo', obj, src])
except DistutilsExecError as msg:
    raise CompileError from msg

While the main thrust of this work is to reduce the size of the language — the number of details and nuances you have to keep track of — there’s a more tangible benefit, as pointed out by A. M. Kuchling:

PEP 8 doesn’t express any preference between the two forms of raise statements:

raise ValueError, 'blah'
raise ValueError('blah')

I like the second form better, because if the exception arguments are long or include string formatting, you don’t need to use line continuation characters because of the containing parens.

Less line noise, a smaller language; what’s not to like?

Lately, I’ve been working on a PEP to change how Python 3’s “except” statements work. The highlights:

(Anyone wanting to discuss these should join the python-3000 list and comment there.)

  • The grammar for “except” statements will change from

    except_clause: 'except' [test [',' test]]

    in Python 2 to

    except_clause: 'except' [test ['as' NAME]]

    in Python 3. This is being done to eliminate a syntactic ambiguity where the parser can’t tell whether

    except EXPRESSION, EXPRESSION:

    should be interpreted as

    except TYPE, TYPE:

    or

    except TYPE, TARGET:

    Python 2 opts for the latter semantic, at the cost of requiring the former to be parenthesized.

    Converting Python 2-style “except” statements to Python 3 can be handled automatically (for the most part) by Guido van Rossum’s 2to3 utility.

  • As specified in PEP 352, the ability to treat exceptions as tuples will be removed, meaning this code will no longer work:

    except os.error, (errno, errstr):

    Because the automatic unpacking will no longer be possible by default, the ability to use tuples as “except” targets at all will be removed.

  • PEP 344 specifies that exception instances in Python 3 will possess a __traceback__ attribute. The Open Issues section of that PEP includes a paragraph on garbage collection difficulties caused by this attribute, namely a “exception -> traceback -> stack frame -> exception” reference cycle, whereby all locals are kept in scope until the next GC run. Python 3 will resolve this issue by making sure the target name is deleted at the end of the “except” suite, thus breaking the cycle.

    This will be done by having the compiler emit appropriate bytecode to translate

    try:
        try_body
    except E as N:
        except_body
    ...

    to this (in Python 2.5 terms):

    try:
        try_body
    except E, N:
        try:
            except_body
        finally:
            N = None
            del N
    ...

    An implementation of this has already been checked into the p3yk [sic] branch.

To anyone planning to email me about how much you hate the syntax for return value annotations: don’t. Guido wants the -> arrow, so the arrow is what we’re getting. Your ideas for using returns or return or whatever else have already occurred to others — namely me — and were rejected months ago.

Guido’s the one you have to convince, not me, and he’s already made up his mind.

Based on this python-3000 thread and a number of off-list emails, I’m dropping my earlier objection to PEP 3107. I hadn’t been convinced that there was a sufficiently broad spectrum of use-cases for function annotations to justify changing Python’s syntax. A bunch of people came out of the woodwork with viable uses for annotations, which is what I was looking for. Accordingly, I’ll be working up a patch to PEP 3107 to include a “Use Cases” section.

Thanks to everyone who emailed or commented, especially Phillip J. Eby, who led the python-3000 effort to convince me.

A blogified version of a python-3000 post, in which the author of PEP 3107 revels in situational irony.

I was explaining function annotations to a friend this past weekend and found that, even though I had written a PEP on the subject and spent months debating the little details of “how are we going to make annotations work?”, I was hard-pressed to answer the question of “why are we doing this?”

The biggest problem I faced — then and now — is justifying the use-cases for annotations. Here’re the use-cases I could come up with off the top of my head: information for typecheckers; doc strings for parameters; extra information for IDEs; extra information for static analysis tools like pylint. These can all be addressed together:

Are the users clamoring for these things? Do these address real problems that users are having?

Not to my knowledge.

  1. Information for typecheckers

    In a recent python-ideas post, Guido van Rossum said that “Collin’s existing type annotation library … could be made more elegant by attaching the types directly to the arguments”. As far as I can tell, the only gains in elegance are that you don’t have to repeat the names of a function’s parameters in the typechecking decorator. None of my users have ever complained about this tiny bit of repetition, and I’ve never felt it an undue burden in my own usage.

    It could even be considered an advantage, since including the “annotations” in the typechecking decorator means all I have to do to remove typechecking from a function is delete a single line, rather than pick through a function’s declaration, removing the relevant bits.

  2. Doc strings for parameters

    def foo(a: 'the object to be frobnicated',
                b=7: 'this controls the level of frobnication',
                c='Rojo': 'the name of a color, in Spanish, that should
    be applied to the 
                      frobnicated thing') -> 'Returns an integer between
    9 and 13':
       ''' Frobnicate an object in a Spanish way '''
       ...

    What does this accomplish that can’t be achieved with any of the standard documentation syntaxes in existence today?

  3. Type information for IDEs

    I can see it being genuinely useful to be able to get parameter/return type information in a tooltip message. But IDLE can do this already, without annotations:

  4. Type information for static analysis tools

    Quoting Nick Coghlan, from August 2006: “annotations wouldn’t be useful for tools like pychecker … to be really useful for a tool like pychecker they’d have to be ubiquitous, and that’s really not Python any more”. Agreed.

I could say You Aren’t Going to Need It, but that gets the tense wrong; we’re getting along without annotations quite nicely here in the present. In short: I think that PEP 3107 be rejected as an overly-specific, unnecessary addition to the language.

Anyone with thoughts on or responses to this article should post them to the python-3000 mailing list.

Next Page »