software engineering


In our last episode, I talked about how not to write Perl, drawing cases of poor design from Net::Google. Let’s play that game again, this time focusing on the do-s and don’t-s of accessor methods.

Don’t write your own accessors

Every time you write a method like this, an angel dies:

sub safe {
  my $self = shift;
  my $bool = shift;

  if (defined($bool)) {
    $self->{'_safe'} = ($bool) ? 1 : 0;
  }

  return $self->{'_safe'};
}

Seriously, there are 20 billion libraries on CPAN that are specifically designed to do this kind of stuff for you. To name only a handful:

Some of these are better than others, some I would recommend (Moose, Class::BuildMethods), some I wouldn’t (Class::MethodMaker), but the point is this: this is Perl, we’re supposed to be lazy. Writing your own accessors is not lazy. Computers generally — and reusable modules in particular — are supposed to free us from this kind of busy work. CPAN is your friend; stupid crap like hand-writing formulaic accessor methods is not.

Speaking of things that aren’t your friend…

AUTOLOAD is not an accessor system

Two of Net::Google’s sub-modules, Net::Google::Response and Result, eschew hand-written accessors in favor of something arguably worse: AUTOLOAD. Rather than writing out each accessor, you get this:

use constant RESPONSE_FIELDS => qw [
directoryCategories estimateIsExact startIndex
searchTime estimatedTotalResultsCount searchTips
searchComments searchQuery endIndex documentFiltering ];

sub AUTOLOAD {
  my $self = shift;

  $AUTOLOAD =~ s/.*:://;

  unless (grep/^($AUTOLOAD)$/,&RESPONSE_FIELDS) {
    carp 'Unknown attribute : '.$AUTOLOAD;
    return undef;
  }

  return $self->{'__'.$AUTOLOAD};
}

Ignoring the O(n) array-based test to see if an accessor is valid, this approach seems like a good idea at first glance: you save yourself from writing and maintaining a bunch of copy-and-pasted, formulaic accessor methods. And that’s true, it does; it’s a win in that regard. But here’s what you lose:

  • Half the point of using accessor methods instead of just using a hashref (the way things used to be done in the Perl world) is the compile-time safety it buys you. In the hashref-based object implementation, a typo in the name of an attribute is silently ignored; after all, the compiler doesn’t know that you mistyped something, it just thinks you’re, e.g., assigning to a new key in the hash. Using methods allows perl to do some amount of typo checking for you: now, a typo in an attribute name results in a compile-time error.

    Using AUTOLOAD takes us right back to the hash-based implementation. Because the difference between a valid method call and a typo is hidden away, tucked inside AUTOLOAD, what could have been compile-time errors (using real accessor methods) are pushed back to runtime errors. You’re not using the compiler to your full advantage.

  • Using AUTOLOAD means shooting introspection and reflection all to hell. It doesn’t matter that your AUTOLOAD function says x() is a valid method call; if you ask Perl whether or not an object can() x(), you’ll get back “no” every time. It doesn’t matter that your AUTOLOAD function allows calls of the pattern set_(\w+)(); if you try to iterate over every method in a given class, those methods will never, ever show up.

    Congratulations. Your classes are now less useful to other developers. Pat yourself on the back.

This doesn’t even touch on how using AUTOLOAD mucks up your class inheritance. For more about that, this post by Michael G. Schwern is a good starting point.

Accessors should not lose information

What I put into a getter method, I should get back out from the setter. Some mutation or modification of the input is acceptable, but not like this:

sub lr {
  my $self = shift;
  my @lang = @_;

  if ((scalar(@lang) > 1) && ($lang[0] eq '')) {
    $self->{'_lr'} = [];
    shift @lang;
  } 

  if (@lang) {
    push @{$self->{'_lr'}},@lang;
  }

  return join('',@{$self->{'_lr'}});
}

This method sets the language restrictions for a call to Google’s search API. Here’s the kicker, and why I chose this as an example:

>> use Net::Google::Search;
>>>
>>> my $search = Net::Google::Search->new();
>>> $search->lr(qw(lang_en lang_fr lang_de));
lang_enlang_frlang_de
>>>

You put in an array, you get a string back out. Getting the input array back is going to require some serious split() magic. (Hint: don’t forget about the codes for Chinese, which don’t follow the lang_([a-z]{2}) format).

Now, there’s actually a reason — though not a good one — why the getter emits a string like this: the call to the SOAP API method apparently expects a string. However, if you need this kind of mutation inside your application, that’s where the joining/splitting/processing should be done: inside the application. The people using your library should never care about — and should never be forced to grapple with — implementation details like this.

Validate your input

For all of Net::Google’s hand-written accessor methods, only one does any kind of input validation. You end up with methods like this one, from Net::Google::Search:

sub starts_at {
  my $self = shift;
  my $at   = shift;

  if (defined($at)) {
    $self->{'_starts_at'} = $at;
  }

  return $self->{'_starts_at'};
}

starts_at() is defined as accept an integer, but in reality it doesn’t care. You could pass it a string, a coderef, a blessed regex, and it would be none the wiser. You’ll only find out about the error much later, when some other part of the application asks starts_at() for an integer, gets back an arrayref and freaks the hell out.

That’s why doing input validation in the accessors is important: so you know that the data is invalid (i.e., there’s a bug) as soon as possible. Doing validation in the accessor makes it trivial to track down where the bad data came from, or at least how it got to your code. The further you move validation down the call-chain, the more call sites you have to examine to find out how starts_at() came by this bad data. More call sites == more work == bad.

Now, I sympathize somewhat with Net::Google on this one. It’s hard to do input validation when the final arbiter of “valid input” is a web service that you don’t control. Every time the web service changes — either to accept previously-invalid data or vice-versa — you have to re-release. That sucks. However, there’s still some minimal validation you can do. For example, Net::Google::Search’s lr() accessor doesn’t have to know about every possible language restriction, but it can make sure all arguments are of the right type (here: strings). In other cases, you might not be able to say with 100% certainty that every element in an arrayref is valid or invalid, but you can at least figure out whether they’re all integers.

It’s these little things that go a long way.

As I mentioned, I’ve recently taken over ownership of a project at work that makes heavy use of Google’s SOAP APIs, using Net::Google to handle the SOAP stuff. A major part of my initial marching orders were to fix one example of Net::Google’s curious definition of “error handling”.

There’s a long-standing problem with Google’s search API that it throws “502 Bad Gateway” errors around 20% of the time. Net::Google’s attitude toward this (and all other errors) is to simply carp and carry on. From our standpoint, we want a clean way of being able to differentiate what the error was: was the user’s API key invalid? Have they reached their query limit? Did we get yet another 502? Overriding Perl’s warning mechanism does not qualify as clean or pretty or…really, any positive adjective, so I downloaded Net::Google’s latest release and started reading.

I have only recently stopped screaming in terror.

It soon became clear that my task was larger than “throw errors when Google’s SOAP server craps out”. Much larger. So large that I’m now putting together a 2.0.0 release of this package. In the spirit of learning from mistakes, let’s work through this poor module and look at ways some of this stuff could be done better:

Do not ignore errors

I don’t care what language you’re using, On error resume next is never a good idea. In Net::Google’s case, any errors at the SOAP layer are trapped and reissued as warnings, continuing on its merry way as if nothing had happened. This includes things like:

  • “502 Bad Gateway” - for when the SOAP service is down
  • “Invalid authorization key” - when the user had given us a bad API key
  • “Daily limit of 1000 queries exceeded” - try again tomorrow

I want to be able to trap these and respond differently to each scenario. If I get a 502 error, I might sleep for a few seconds, then try again. If I get an “invalid authorization key”, I should inform the user so they can give me a valid key. The point is that I want to be able to respond to odd — one might say “exceptional” — behavior. Having these messages emitted as warnings makes that a lot harder.

Do not leave useless code lying around

From Net::Google::Search:

use constant RESTRICT_ENCODING => qw [ arabic gb ... a dozen more ... cyrillic utf8 ];

use constant RESTRICT_LANGUAGES => qw [ ar zh-CN ... a dozen more ... sv tr ];

use constant RESTRICT_COUNTRIES => qw [ AD ... dozens and dozens more ... ZA ZM ZR ];

use constant RESTRICT_TOPICS => qw [ unclesam linux mac bsd ];

use constant WATCH => "__estimatedTotalResultsCount";

Guess how many of these were used in the following code? That’s right, zero. None of them. Some of these arrays had several dozen elements, just taking up space to no effect. Several other submodules had similar sections, uselessly defining constants that would never be used in the code.

Worse, they took up brain cycles and time while I tried to figure out a) what they were for, then b) if I could safely remove them without breaking anything.

Do not modify constants

*sigh*. You’d think it would be obvious; these things are called constants for a reason. You don’t change them. You can’t change them. Ever. Yet what do I find in Net::Google::Service

use constant SERVICE_CACHE => {};

followed a little later by

&SERVICE_CACHE->{$service} = "$dir/Net/Google/Services/".&SERVICES->{$service};

That’s right: modifying a constant. What makes this even better The Daily WTF fodder is the comment above this line, complaining about how using the & sigil is a work-around for a bug in perl 5.00502. In the spirit of the old “It hurts when I do this”-”Don’t do that” joke, here’s my advice about this whole mess: don’t do it. I have trouble imagining the thought process that concluded, “You know, a hash just won’t do”.

“Invalid input” does not mean “guess”

There’s an old saying ’round these parts, “in the face of ambiguity, refuse the temptation to guess“. When it comes to programming, you can swap out “ambiguity” for any number of things and generate valid maxims all day long. In this situation, I’ll take “an error”: in the face of an error, refuse the temptation to guess. Here’s one such offender, taken from Net::Google::Search:

if (int($max) < 1) {
    carp "'$max' must be a int greater than 0";
    $max = 1;
}

Bonus points: notice that the warnings produced look like “‘-5′ must be an int greater than 0″. a) the quote marks around -5 make it look like a string, and b) a much more helpful message would be “max_results must be an integer greater than 0″, i.e., don’t just tell me how I screwed up, tell me what I screwed up, too.

Documenting it doesn’t make it so

Unmodified:

=head2 $obj->starts_at($at)

Returns an int. Default is 0.

Returns undef if there was an error.

=cut

sub starts_at {
  my $self = shift;
  my $at   = shift;

  if (defined($at)) {
    $self->{'_starts_at'} = $at;
  }

  return $self->{'_starts_at'};
}

Can someone please explain to me how this method “[r]eturns undef if there was an error”? No? I can’t either. Nor can I figure out what kind of errors could possibly arise in this code.

If you want a piece of code to do X, it helps to actually make it do X. If your docs say one thing and the code does another, the code is generally going to prevail when push comes to shove. It’s important to make sure your code does what you want it to; that’s where tests come in. Maybe this accessor method at one time had some kind of input validation and would in fact return undef if that validation failed, and just maybe that code accidentally got wiped out at some point. A test suite would let you know that.

My, what a nice segue for the next point..

Good tests are your friends

As shipped, Net::Google 1.0.1 has…let me count them…8 tests (12 if you include duplicates). And those 8 aren’t exactly worth much. Here’s one from t/002-cache.t:

my $search = $google->search();
isa_ok($search, 'Net::Google::Search');

$search->query(QUERY);
$search->max_results(MAXRESULTS);
$search->filter(1);

my $results = $search->results();
is(ref($results), 'ARRAY', 'Got results for '.QUERY);

That’s the full extent of the testing for the interface to Google’s search API. Is $results tested to make sure it has elements? Why waste time! Are the elements of $results tested to make sure they’re filled out? Poppycock! Do we ever make sure that the accessors on $search work correctly? Hells no!

I know testing isn’t fun and isn’t glamorous. No-one ever got laid or got their picture in the paper for working on a test suite. That doesn’t mean you shouldn’t do it, though, ’cause you know what else doesn’t get you laid? Writing poorly-specified, untested software that’s incredibly difficult for other people to modify because they have no idea what it was supposed to do in the first place. Call it karma.

“Method X exists” is not documentation

One last documentation gem before we go:

=head2 $obj->filter($bool)

Returns true or false. Returns undef if there was an error.

=cut

Thank $diety the docs don’t tell me what this property controls or does; it’s not like I may need to use it. Dear reader: if you want me to use your software, please tell me what it does. If I have to read the code to figure out what an accessor method controls, you lose; I’ll use someone else’s package. I just don’t care that much.


Next time we’ll go over what Net::Google teaches us about accessor methods. There was so much material here, it took up its own post.

I’ve recently taken over ownership of a new project at work, a suite of web-based tools for search engine optimization. In the course of patching Net:Google, the module we use for talking to Google’s SOAP API I found myself staring down a number of classes, some with AUTOLOAD-based accessor systems, some with accessors based on the tried-and-true Class::Accessor::CopyNPaste method. In neither case was I pleased. I needed a real accessor generator.

A CPAN search turned up Class::BuildMethods and Class::MethodMaker as promising options. My quick review follows:

  • Both share the same declarative style: “give me methods x, y and z, with these defaults, using these functions for validation”.

    From Class::MethodMaker:

    package Foo::Bar;
    
    use Class::MethodMaker
        [ scalar => [{default => 10}, 'x'],
          scalar => [{default => 6}, 'y'],
        ];

    From Class::BuildMethods:

    package Foo::Bar;
    
    use Class::BuildMethods
        x => {default => 10},
        y => {default => 6};
  • Where Class::BuildMethods can only generate accessors for scalars (requiring arrays and hashes to be passed in by reference), Class::MethodMaker can generate special accessors for array and hash properties:

    use Class::MethodMaker
        [ array => [qw(x y)],
          hash => 'z',
        ];

    There’s a downside to this, though. When dealing with array accessors (and possibly hash accessors, too; I haven’t tested them), Class::MethodMaker gets too clever for my tastes: in scalar context, an array getter will return an arrayref, while in array context you get an array. The docs don’t make any mention this (that I was able to find), and I was seriously confused as to why

    is_deeply($self->x, [], 'Correct default (x)')

    kept complaining that an arrayref was not equal to “Correct default (x)”.

  • Class::MethodMaker offers considerably more flexibility and power than does Class::BuildMethods. While BuildMethods only allows you to validate the input to an accessor (i.e., accept or die), MethodMaker permits more general input-processing capabilities.

    A feature I found more useful is MethodMaker’s ability to generate a proper new() method for you. With this in hand, I was able to define an entire class like so:

    package Result;
    
    use Class::MethodMaker
        [ scalar => [qw/title URL snippet cachedSize/],
          scalar => [qw/hostName directoryCategory/],
          new => 'new'
        ];
  • Both modules (as of this writing) come with the same bug: neither allow you to give an accessor a default value of 0 or “”. While tracking down the cause of the bug in Class::BuildMethods was trivial, and a bug report has already been filed, fixing Class::MethodMaker, on the other hand,…

  • While Ovid’s code in Class::BuildMethods is very clear and easy to understand, Class::MethodMaker’s author seems to have picked the most complicated implementation possible. This is serious hate-the-end-user stuff. In what appears to be an effort to avoid writing any piece of code more than once, Class::MethodMaker’s internals are built-up by running several files through a custom-written preprocessor that makes absolutely no sense to anyone but its author and comes with zero documentation. It took me about 45 seconds to fix BuildMethods; after at least 30 minutes spent delving about in the core of Class::MethodMaker, I’m still no closer to coming up with a patch.

Conclusion: Class::MethodMaker comes across as overly powerful and complicated, with way more features, nooks and crannies than I’ll ever need, and an implementation that makes the Baby Jesus — not to mention my old CSCI 101 professor — cry. Class::BuildMethods is simple, elegant and gets the job done, though the lack of a new() emitter is a tick mark against it. Verdict: I’m going with Class::BuildMethods for now, plus submitting a patch for a new() method generator.

As part of the push to get an initial release of svnmock out the door, I’ve spent the last two weeks or so working on a website for the project. The biggest part of this effort has been writing documentation, particularly a tutorial, to explain this silly package.

By and large, the documentation has been fairly easy to write. I’m pretty good at, and enjoy, the very precise, technical style needed to write exacting specifications. I also look at spec writing as an opportunity to double-check the test suite; for every claim about how a given piece of code operates, I add a footnote pointing to the area of the test suite that verifies that claim. True, this makes things take longer than they otherwise would, but I’ve found several untested bits of code this way.

The tutorial section, on the other hand, has been torture. Whereas spec writing involves merely throwing down everything I know about the code in the most lawyer-like language I can summon, crafting a tutorial requires writing about the project as if I knew nothing about it and were exploring it for the first time. It can get frustrating, trying to distill an entire software package into simple, little words.

A similar effort has been underway for typecheck, as well. I’m pushing to have the project ready for publication on comp.lang.python by 0.4, our next minor-version release. We’ve recently brought David Wheeler on board, and he’s given me some good insights as to things we can do better.

One of the big problems he’s pointed out so far is the package’s documentation. It sucks, and you’ll get no argument from me about that. The docs have long consisted of a single, pages-long document that skims the package’s functionality. This format is left over from Iain Lowe’s initial effort, and while it may have worked in the past, the current version of the package is too large, too wide-ranging, to fit in a single page.

My current effort has involved breaking the current document into a sectioned-off tutorial and reference manual. As a result, it’s much better organised now, with a hint of “flow” and a dash of “coherence”. Also, I’m spending a lot of time working on more realisitic examples; as it is, a lot of the demos use the int and float types, which makes for simple examples, but they’re not very realistic.

I’m hoping to get the initial draft out the door some time today, then open the floor for comments and revisions. Mmm…open source.