There’s…too many of them!

Following Guido’s announcement/reminder that all Python 3000-related PEPs* have to be in by the end of April, PEPs have been coming out of the woodwork:

  • PEP 3119 - Guido’s abstract base classes PEP.

  • PEP 3120 - Using UTF-8 as the default source encoding

  • PEP 3121 - More flexible module Initialization and finalization.

  • PEP 3122 - Change how the “main” module is delineated. (This PEP has already been rejected.)

  • PEP 3141 - A proposal for a hierarchy of numeric base classes, based on PEP 3119.

There’re also several pre-PEPs being kicked around in the mailing lists:

*: PEPs impacting the stdlib don’t have to meet this deadline.

“I work for Google”.

That’s pretty damn ridiculous.

Working at Google feels very much like being back at college — admittedly, a college populated primarily by computer people; it’s like an IT version of Planet of the Apes.

Google takes extraordinary steps to integrate new hires into the corporate culture as fast and as comprehensively as possible. Nooglers spend the first two weeks in classes about the company’s core systems, a time when you’re expected to be absolutely unproductive and just learn. Every Noogler is assigned a mentor who serves as your go-to person all the stupid questions you have: why won’t my computer boot? (My first day) How do I run a test suite? (My fifth day) What’s the command to request a code review? (Yesterday)

I’ve been assigned to the Build Tools group where I’ll be working on Mondrian, Google’s new change-review tool created by Guido van Rossum. I also want to get involved with the various testing-focused groups, something Neal Norwitz is urging on.

This place is awesome.

Why do you crash all the time? You used to be so well-behaved, back in your 1.0.whatever days. Then you upgraded to 1.5 and you started segfaulting. Not every day, not even every week, but enough that it was a pain in the ass. Now, with the 2.0 branch, you crash every single goddamn day. Looking at the little TalkBack dialog that just popped up, you’ve crashed 10 times since I upgraded to 2.0.3 last week; I thought bug fix releases were supposed to, I don’t know, fix bugs.

Today you’ve segfaulted twice in two hours. This has got to stop.

Now I see that there’s a there’s a 3.0 release coming this year. New features are great fun to work on, I’m sure, and tracking down whatever bug keeps killing my browser sessions isn’t sexy, but seriously: how about putting in a little time to make the current release series stable? Please?

Two weeks or so ago, I brought up my unittest redesign on the new testing-in-python mailing list. A number of people were upset that in redesigning unittest, I had rejected nose and py.test; Titus Brown even wrote a few blog posts on the subject, in particular taking me to task for ignoring nose.

I’ll be honest: when I started redesigning unittest, I did ignore nose and py.test. I remembered looking at them a long time ago, when I was first getting frustrated with unittest, casting around for a better, more flexible alternative. py.test has no support for extensions and depends on the rest of the py library, so that’s out. nose has plugins, but my general impression was that it’s just a nice test discovery tool; since that wasn’t what I was looking for, I didn’t care. Thinking that perhaps the project has changed significantly since the last time I looked at it, I took another, closer look at nose’s infrastructure. Verdict: it’s still a nice test discovery tool, but since that’s still not what I’m looking for, I still don’t care.

And now we will have a brief intermezzo, and I will explain exactly why I’m redesigning unittest.

First of all, I didn’t start off with the intention of rewriting the whole module. I began by trying to change the existing design so that it would be easier to compose extensions. So I poked and I tweaked and prodded and twisted unittest until it was unrecognizable, until I was left with something that resembled the old version in name only. That is to say: this didn’t start out as a rewrite — it just ended up that way.

Now, what do I mean when I say “composing extensions”? Yes, unittest as-shipped allows you to extend its functionality by way of subclassing this bit and that bit, but the problem comes when trying to mash two extensions together: you can’t. You can’t put your unittest extensions — say, one that does refcount checking for C extensions or one that writes test results to a database — up on PyPI and have people be able to mix and match to create just the right testing environment for their project.

This all has one major design implications for your testing framework: extensions must operate without knowing anything about what other extensions might be running. The framework has to be designed so that extensions can operate by themselves just as well as they do with 15 others.

nose doesn’t come anywhere close to supporting this.

(Note: the following is based on my best understanding of nose’s codebase and on conversations with others. If I’ve gotten anything wrong, please let me know and I’ll gladly retract it.)

“That’s crap,” you say, “nose has plugins!” Ha. nose plugins don’t come anywhere close to achieving this level of independence. If I want to add a plugin to allow tests to be marked as TODO, there’s no way for this new kind of test-status to make its way into the various reporting plugins. As far as I can tell, just to get TODO tests not to show up as failures in the default console output, I’d have to:

  • Subclass nose.result.TextTestResult, overriding addError() so that it picks up the TODO-ness of the test.

  • Subclass nose.core.TextTestRunner, overriding _makeResult() so that it uses my TextTestResult subclass.

  • Subclass nose.core.TestProgram, overriding runTests() so that it uses my TextTestRunner subclass.

  • Replace nose.core.run() with a function that uses my TestProgram subclass.

    Of course, by the time my plugin is running and trying to do all this subclassing/replacing malarkey, nose.core.run() has already been called, so it’s too late.

By contrast, adding this kind of support to my unittest redesign is trivial. Omitting the TODO() decorator and exception classes (which you’d need for the nose version, too):

class TodoRunner(TestRunner):
  categories = ['todo pass', 'todo fail']

  def handle_exception(self, test, exc_info):
    exc_type = exc_info[0]
    if issubclass(exc_type, TodoPassed):
      self.log_exception('todo pass', test, exc_info)
    elif issubclass(exc_type, TodoFailed):
      self.log_exception('todo fail', test, exc_info)
    else:
      super(TodoRunner, self).handle_exception(test, exc_info)

  def was_successful(self):
    parent_success = super(TodoRunner, self).was_successful()
    return parent_success and not self.still_todo()

  def still_todo(self):
    return self.exceptions['todo pass'] 
           or self.exceptions['todo fail']

  def failure_label(self):
    if self.still_todo():
      return 'TODO'
    return super(TodoRunner, self).failure_label()

With those lines of code, all output extensions — console, database, XML, etc — will automatically recognize TODO tests and treat them as such. No fuss, no muss.

Now, all this isn’t to say that nose is crap. What I said earlier is still true: nose is a good test discovery tool. I even hope to borrow some of its discovery strategies for the new design. What nose is not, however, is an ultra-flexible test environment framework where extensions can be shared easily and openly, and that’s what I’m going for.

It’s been a busy six weeks since I last posted anything. Highlights:

  • I spent a week in San Francisco, interviewing with Bittorrent, YouTube and Google. All three days of interviews went well, with both Bittorrent and Google extending me offers. I’ve accepted Google’s invitation to come work in Mountain View, and my first day will be in just under three weeks. If things go as expected, I’ll be working in the Build Tools group with Guido van Rossum and Thomas Wouters.

  • I’ve been doing a lot of work on Guido’s 2to3 tool. 2to3 is designed to take a lot of the grunt work out porting code from Python 2.x to 3.x. The goal is to allow automatic translation of 90% of Python 2.x source, allowing humans to concentrate on the trickier semantic/library changes.

    I’m working on a post devoted to 2to3 that will explain what it is, and what it can and can’t do. There’s been a lot of unease and uncertainty about the 2.x->3.x transition, and I’m hoping to clear some of that up.

  • I’ve also been spending some quality time in Python’s issue trackers, triaging older patches and bugs, including some of my own : )

The two PEPs I talked about in the last two entries have just gone live. The PEPs (3109, 3110) are more formal and more comprehensive versions of the blog entries, so if you read those posts, you’ll have the basic ideas.

Following up on my last post about catching exceptions in Python 3, here are some excerpts from the companion PEP I’m working on, which addresses “raise” statements.

There are simply too many forms to the raise statement in Python 2. Quoting from the reference manual:

If no expressions are present, raise re-raises the last exception that was active in the current scope…

Otherwise, raise evaluates the expressions to get three objects, using None as the value of omitted expressions. The first two objects are used to determine the type and value of the exception.

If the first object is an instance, the type of the exception is the class of the instance, the instance itself is the value, and the second object must be None.

If the first object is a class, it becomes the type of the exception. The second object is used to determine the exception value: If it is an instance of the class, the instance becomes the exception value. If the second object is a tuple, it is used as the argument list for the class constructor; if it is None, an empty argument list is used, and any other object is treated as a single argument to the constructor. The instance so created by calling the constructor is used as the exception value.

If a third object is present and not None, it must be a traceback object…and it is substituted instead of the current location as the place where the exception occurred… The three-expression form of raise is useful to re-raise an exception transparently in an except clause, but raise with no expressions should be preferred if the exception to be re-raised was the most recently active exception in the current scope.

That’s pretty complex, and it doesn’t even address string exceptions. Until I started digging around in the interpreter internals, I didn’t even know the three-object form was possible. Here’s what raise will look like in Python 3:

  1. raise (with no arguments) is used to re-raise the active exception in an except block.

  2. raise EXCEPTION is used to raise a new exception. This form has two sub-variants: EXCEPTION may be either an instance of BaseException or a subclass of BaseException (follows from PEP 352). If EXCEPTION is a subclass, it will be called with no arguments to obtain an exception instance.

    To raise anything else is an error.

“But wait! That doesn’t allow me to supply a traceback!”. Never fear, PEP 344 is here. It specifies that exceptions will grow a __traceback__ attribute, and this is how we’ll be able to raise exceptions with arbitrary tracebacks. What looked like this in Python 2

raise Type, Value, Traceback

will look like this in Python 3

e = Type(Value)
e.__traceback__ = Traceback
raise e

Or possibly this (per a suggestion from Guido):

raise Type(Value).set_traceback(Traceback)

I’m also relying on PEP 344 to replace Python 2’s raise Type, Instance variant. This is most often used to “cast” an exception instance from one type to another, such as this example from distutils.bcppcompiler:

try:
    self.spawn (['brcc32', '-fo', obj, src])
except DistutilsExecError, msg:
    raise CompileError, msg

PEP 344 introduces a raise ... from ... statement and a corresponding __cause__ attribute. Taking advantage of these new tools, the above Python 2 snippet translates to

try:
    self.spawn (['brcc32', '-fo', obj, src])
except DistutilsExecError as msg:
    raise CompileError from msg

While the main thrust of this work is to reduce the size of the language — the number of details and nuances you have to keep track of — there’s a more tangible benefit, as pointed out by A. M. Kuchling:

PEP 8 doesn’t express any preference between the two forms of raise statements:

raise ValueError, 'blah'
raise ValueError('blah')

I like the second form better, because if the exception arguments are long or include string formatting, you don’t need to use line continuation characters because of the containing parens.

Less line noise, a smaller language; what’s not to like?

Lately, I’ve been working on a PEP to change how Python 3’s “except” statements work. The highlights:

(Anyone wanting to discuss these should join the python-3000 list and comment there.)

  • The grammar for “except” statements will change from

    except_clause: 'except' [test [',' test]]

    in Python 2 to

    except_clause: 'except' [test ['as' NAME]]

    in Python 3. This is being done to eliminate a syntactic ambiguity where the parser can’t tell whether

    except EXPRESSION, EXPRESSION:

    should be interpreted as

    except TYPE, TYPE:

    or

    except TYPE, TARGET:

    Python 2 opts for the latter semantic, at the cost of requiring the former to be parenthesized.

    Converting Python 2-style “except” statements to Python 3 can be handled automatically (for the most part) by Guido van Rossum’s 2to3 utility.

  • As specified in PEP 352, the ability to treat exceptions as tuples will be removed, meaning this code will no longer work:

    except os.error, (errno, errstr):

    Because the automatic unpacking will no longer be possible by default, the ability to use tuples as “except” targets at all will be removed.

  • PEP 344 specifies that exception instances in Python 3 will possess a __traceback__ attribute. The Open Issues section of that PEP includes a paragraph on garbage collection difficulties caused by this attribute, namely a “exception -> traceback -> stack frame -> exception” reference cycle, whereby all locals are kept in scope until the next GC run. Python 3 will resolve this issue by making sure the target name is deleted at the end of the “except” suite, thus breaking the cycle.

    This will be done by having the compiler emit appropriate bytecode to translate

    try:
        try_body
    except E as N:
        except_body
    ...

    to this (in Python 2.5 terms):

    try:
        try_body
    except E, N:
        try:
            except_body
        finally:
            N = None
            del N
    ...

    An implementation of this has already been checked into the p3yk [sic] branch.

A long time ago, in a blog post a few pages back in the archives, I spent a few paragraphs bemoaning Python’s unittest module and how it can’t be readily extended, nor can its extensions be easily composed. I gave as examples an extension that allows you to mark tests as “todo” and an extension that did reference counting around each test case (for C modules). While writing the extensions themselves was a little harder than I would have liked, the biggest problem was composing them — using both at the same time. Specifically, you can’t compose them, not without writing all-new code to merge the two functionalities. Consider:

TODO support:
        140 lines (5 core classes, 4 support classes/funcs)

Refcounting support:
        117 lines (4 core classes)

Composition:
        197 lines (6 core classes, 4 support classes/funcs)
        105 lines (3 classes of entirely new/rewritten code)

(All code snippets can be found in this directory. Code related to the old unittest design is in the before/ subdir, that related to the new design is in after/.)

test_harness, my new unittest package, was designed with flexibility and extensibility in mind. Using the same todo/refcounting examples from above:

TODO support:
        61 lines (1 core class, 4 support classes/funcs)

Refcounting support:
        36 lines (1 core class)

Composition:
        5 lines (1 core class, 3 imports)

That’s right: todo and refcounting support, with results written to stdout in five lines. And one of those lines is blank.

Where the new design really shines is in output. Unlike the old design — where you’d have to rewrite everything — changing your logging scheme from to-console to XML means changing this

    from test_harness import TextRunner
    from refcounting import RefcountRunner
    from todo import TodoRunner, TODO

    class OurRunner(TextRunner, RefcountRunner, TodoRunner):
        pass

to this:

    from xmlrunner import XmlTestRunner
    from refcounting import RefcountRunner
    from todo import TodoRunner, TODO

    class OurRunner(XmlTestRunner, RefcountRunner, TodoRunner):
        pass

That’s a two line change. That would have required a complete rewrite with the old system. Want both XML and to-console logging? Stick with the old unittest design and you’re looking at yet another rewrite. test_harness allows you to do this:

from test_harness import TextRunner
from xmlrunner import XmlTestRunner
from refcounting import RefcountRunner
from todo import TodoRunner, TODO

class OurRunner(TextRunner, XmlTestRunner, RefcountRunner, TodoRunner):
    pass

The biggest problem with the old unittest design is that, in trying to separate out the various concerns, it left the different components interconnected. TestCase objects depend on TestResult objects having certain methods; TestLoaders depend on your test case classes subclassing TestCase; TestRunners control which TestResult is used; etc. test_harness does away with this menagerie in favor of a single class: TestRunner. TestRunner objects are responsible for test suite iteration, running each individual test, collecting and categorizing any exceptions, and summarizing the results of the test run. Test loading/discovery is orthogonal to this process and as such is left to other packages, though rudimentary solutions are provided with the new package.

The biggest gripes about unittest I heard while researching unittest’s problems is that you a) have to subclass TestCase, and b) use TestCase methods to indicate test success/failure. In test_harness, there is no requirement to subclass TestCase (nor is there a TestCase class to subclass). Also, the usage of TestCase methods to signal failure — a consequence of the old TestCase/TestResult linkage — has been replaced with a test_harness.assertion submodule that contains functions like ok(), are_equal(), etc. Mapping old spellings to new:

    self.failUnless()               < = >    ok()
    self.assertEqual()              < = >    are_equal()
    self.failIfEqual()              < = >    are_not_equal()
    self.failUnlessAlmostEqual()    < = >    are_almost_equal()
    self.assertRaises()             < = >    raises()

Anyone interested is encouraged to play around with the new design. Comments to collinw at gmail point com

I made my first commit to Python today.

If that doesn’t get the girls, I don’t know what will.

« Previous PageNext Page »