test_harness


If you’re in the Silicon Valley area, I’ll be speaking at the next BayPIGgies meeting about test_harness, my testing framework designed to (hopefully) replace unittest. The meeting is Thursday, 14 June at 7:30PM at Google. Check the BayPIGgies site for more details.

For those of a more European persuasion, I’ll be at EuroPython 2007 in Vilnius, Lithuania, giving much the same talk. Since the paper was accepted in the peer review track, I’ll have to science it up some, but the talk should stay pretty much the same. Except that it will be in Lithuanian.

Two weeks or so ago, I brought up my unittest redesign on the new testing-in-python mailing list. A number of people were upset that in redesigning unittest, I had rejected nose and py.test; Titus Brown even wrote a few blog posts on the subject, in particular taking me to task for ignoring nose.

I’ll be honest: when I started redesigning unittest, I did ignore nose and py.test. I remembered looking at them a long time ago, when I was first getting frustrated with unittest, casting around for a better, more flexible alternative. py.test has no support for extensions and depends on the rest of the py library, so that’s out. nose has plugins, but my general impression was that it’s just a nice test discovery tool; since that wasn’t what I was looking for, I didn’t care. Thinking that perhaps the project has changed significantly since the last time I looked at it, I took another, closer look at nose’s infrastructure. Verdict: it’s still a nice test discovery tool, but since that’s still not what I’m looking for, I still don’t care.

And now we will have a brief intermezzo, and I will explain exactly why I’m redesigning unittest.

First of all, I didn’t start off with the intention of rewriting the whole module. I began by trying to change the existing design so that it would be easier to compose extensions. So I poked and I tweaked and prodded and twisted unittest until it was unrecognizable, until I was left with something that resembled the old version in name only. That is to say: this didn’t start out as a rewrite — it just ended up that way.

Now, what do I mean when I say “composing extensions”? Yes, unittest as-shipped allows you to extend its functionality by way of subclassing this bit and that bit, but the problem comes when trying to mash two extensions together: you can’t. You can’t put your unittest extensions — say, one that does refcount checking for C extensions or one that writes test results to a database — up on PyPI and have people be able to mix and match to create just the right testing environment for their project.

This all has one major design implications for your testing framework: extensions must operate without knowing anything about what other extensions might be running. The framework has to be designed so that extensions can operate by themselves just as well as they do with 15 others.

nose doesn’t come anywhere close to supporting this.

(Note: the following is based on my best understanding of nose’s codebase and on conversations with others. If I’ve gotten anything wrong, please let me know and I’ll gladly retract it.)

“That’s crap,” you say, “nose has plugins!” Ha. nose plugins don’t come anywhere close to achieving this level of independence. If I want to add a plugin to allow tests to be marked as TODO, there’s no way for this new kind of test-status to make its way into the various reporting plugins. As far as I can tell, just to get TODO tests not to show up as failures in the default console output, I’d have to:

  • Subclass nose.result.TextTestResult, overriding addError() so that it picks up the TODO-ness of the test.

  • Subclass nose.core.TextTestRunner, overriding _makeResult() so that it uses my TextTestResult subclass.

  • Subclass nose.core.TestProgram, overriding runTests() so that it uses my TextTestRunner subclass.

  • Replace nose.core.run() with a function that uses my TestProgram subclass.

    Of course, by the time my plugin is running and trying to do all this subclassing/replacing malarkey, nose.core.run() has already been called, so it’s too late.

By contrast, adding this kind of support to my unittest redesign is trivial. Omitting the TODO() decorator and exception classes (which you’d need for the nose version, too):

class TodoRunner(TestRunner):
  categories = ['todo pass', 'todo fail']

  def handle_exception(self, test, exc_info):
    exc_type = exc_info[0]
    if issubclass(exc_type, TodoPassed):
      self.log_exception('todo pass', test, exc_info)
    elif issubclass(exc_type, TodoFailed):
      self.log_exception('todo fail', test, exc_info)
    else:
      super(TodoRunner, self).handle_exception(test, exc_info)

  def was_successful(self):
    parent_success = super(TodoRunner, self).was_successful()
    return parent_success and not self.still_todo()

  def still_todo(self):
    return self.exceptions['todo pass'] 
           or self.exceptions['todo fail']

  def failure_label(self):
    if self.still_todo():
      return 'TODO'
    return super(TodoRunner, self).failure_label()

With those lines of code, all output extensions — console, database, XML, etc — will automatically recognize TODO tests and treat them as such. No fuss, no muss.

Now, all this isn’t to say that nose is crap. What I said earlier is still true: nose is a good test discovery tool. I even hope to borrow some of its discovery strategies for the new design. What nose is not, however, is an ultra-flexible test environment framework where extensions can be shared easily and openly, and that’s what I’m going for.

A long time ago, in a blog post a few pages back in the archives, I spent a few paragraphs bemoaning Python’s unittest module and how it can’t be readily extended, nor can its extensions be easily composed. I gave as examples an extension that allows you to mark tests as “todo” and an extension that did reference counting around each test case (for C modules). While writing the extensions themselves was a little harder than I would have liked, the biggest problem was composing them — using both at the same time. Specifically, you can’t compose them, not without writing all-new code to merge the two functionalities. Consider:

TODO support:
        140 lines (5 core classes, 4 support classes/funcs)

Refcounting support:
        117 lines (4 core classes)

Composition:
        197 lines (6 core classes, 4 support classes/funcs)
        105 lines (3 classes of entirely new/rewritten code)

(All code snippets can be found in this directory. Code related to the old unittest design is in the before/ subdir, that related to the new design is in after/.)

test_harness, my new unittest package, was designed with flexibility and extensibility in mind. Using the same todo/refcounting examples from above:

TODO support:
        61 lines (1 core class, 4 support classes/funcs)

Refcounting support:
        36 lines (1 core class)

Composition:
        5 lines (1 core class, 3 imports)

That’s right: todo and refcounting support, with results written to stdout in five lines. And one of those lines is blank.

Where the new design really shines is in output. Unlike the old design — where you’d have to rewrite everything — changing your logging scheme from to-console to XML means changing this

    from test_harness import TextRunner
    from refcounting import RefcountRunner
    from todo import TodoRunner, TODO

    class OurRunner(TextRunner, RefcountRunner, TodoRunner):
        pass

to this:

    from xmlrunner import XmlTestRunner
    from refcounting import RefcountRunner
    from todo import TodoRunner, TODO

    class OurRunner(XmlTestRunner, RefcountRunner, TodoRunner):
        pass

That’s a two line change. That would have required a complete rewrite with the old system. Want both XML and to-console logging? Stick with the old unittest design and you’re looking at yet another rewrite. test_harness allows you to do this:

from test_harness import TextRunner
from xmlrunner import XmlTestRunner
from refcounting import RefcountRunner
from todo import TodoRunner, TODO

class OurRunner(TextRunner, XmlTestRunner, RefcountRunner, TodoRunner):
    pass

The biggest problem with the old unittest design is that, in trying to separate out the various concerns, it left the different components interconnected. TestCase objects depend on TestResult objects having certain methods; TestLoaders depend on your test case classes subclassing TestCase; TestRunners control which TestResult is used; etc. test_harness does away with this menagerie in favor of a single class: TestRunner. TestRunner objects are responsible for test suite iteration, running each individual test, collecting and categorizing any exceptions, and summarizing the results of the test run. Test loading/discovery is orthogonal to this process and as such is left to other packages, though rudimentary solutions are provided with the new package.

The biggest gripes about unittest I heard while researching unittest’s problems is that you a) have to subclass TestCase, and b) use TestCase methods to indicate test success/failure. In test_harness, there is no requirement to subclass TestCase (nor is there a TestCase class to subclass). Also, the usage of TestCase methods to signal failure — a consequence of the old TestCase/TestResult linkage — has been replaced with a test_harness.assertion submodule that contains functions like ok(), are_equal(), etc. Mapping old spellings to new:

    self.failUnless()               < = >    ok()
    self.assertEqual()              < = >    are_equal()
    self.failIfEqual()              < = >    are_not_equal()
    self.failUnlessAlmostEqual()    < = >    are_almost_equal()
    self.assertRaises()             < = >    raises()

Anyone interested is encouraged to play around with the new design. Comments to collinw at gmail point com