best practices


In our last episode, I talked about how not to write Perl, drawing cases of poor design from Net::Google. Let’s play that game again, this time focusing on the do-s and don’t-s of accessor methods.

Don’t write your own accessors

Every time you write a method like this, an angel dies:

sub safe {
  my $self = shift;
  my $bool = shift;

  if (defined($bool)) {
    $self->{'_safe'} = ($bool) ? 1 : 0;
  }

  return $self->{'_safe'};
}

Seriously, there are 20 billion libraries on CPAN that are specifically designed to do this kind of stuff for you. To name only a handful:

Some of these are better than others, some I would recommend (Moose, Class::BuildMethods), some I wouldn’t (Class::MethodMaker), but the point is this: this is Perl, we’re supposed to be lazy. Writing your own accessors is not lazy. Computers generally — and reusable modules in particular — are supposed to free us from this kind of busy work. CPAN is your friend; stupid crap like hand-writing formulaic accessor methods is not.

Speaking of things that aren’t your friend…

AUTOLOAD is not an accessor system

Two of Net::Google’s sub-modules, Net::Google::Response and Result, eschew hand-written accessors in favor of something arguably worse: AUTOLOAD. Rather than writing out each accessor, you get this:

use constant RESPONSE_FIELDS => qw [
directoryCategories estimateIsExact startIndex
searchTime estimatedTotalResultsCount searchTips
searchComments searchQuery endIndex documentFiltering ];

sub AUTOLOAD {
  my $self = shift;

  $AUTOLOAD =~ s/.*:://;

  unless (grep/^($AUTOLOAD)$/,&RESPONSE_FIELDS) {
    carp 'Unknown attribute : '.$AUTOLOAD;
    return undef;
  }

  return $self->{'__'.$AUTOLOAD};
}

Ignoring the O(n) array-based test to see if an accessor is valid, this approach seems like a good idea at first glance: you save yourself from writing and maintaining a bunch of copy-and-pasted, formulaic accessor methods. And that’s true, it does; it’s a win in that regard. But here’s what you lose:

  • Half the point of using accessor methods instead of just using a hashref (the way things used to be done in the Perl world) is the compile-time safety it buys you. In the hashref-based object implementation, a typo in the name of an attribute is silently ignored; after all, the compiler doesn’t know that you mistyped something, it just thinks you’re, e.g., assigning to a new key in the hash. Using methods allows perl to do some amount of typo checking for you: now, a typo in an attribute name results in a compile-time error.

    Using AUTOLOAD takes us right back to the hash-based implementation. Because the difference between a valid method call and a typo is hidden away, tucked inside AUTOLOAD, what could have been compile-time errors (using real accessor methods) are pushed back to runtime errors. You’re not using the compiler to your full advantage.

  • Using AUTOLOAD means shooting introspection and reflection all to hell. It doesn’t matter that your AUTOLOAD function says x() is a valid method call; if you ask Perl whether or not an object can() x(), you’ll get back “no” every time. It doesn’t matter that your AUTOLOAD function allows calls of the pattern set_(\w+)(); if you try to iterate over every method in a given class, those methods will never, ever show up.

    Congratulations. Your classes are now less useful to other developers. Pat yourself on the back.

This doesn’t even touch on how using AUTOLOAD mucks up your class inheritance. For more about that, this post by Michael G. Schwern is a good starting point.

Accessors should not lose information

What I put into a getter method, I should get back out from the setter. Some mutation or modification of the input is acceptable, but not like this:

sub lr {
  my $self = shift;
  my @lang = @_;

  if ((scalar(@lang) > 1) && ($lang[0] eq '')) {
    $self->{'_lr'} = [];
    shift @lang;
  } 

  if (@lang) {
    push @{$self->{'_lr'}},@lang;
  }

  return join('',@{$self->{'_lr'}});
}

This method sets the language restrictions for a call to Google’s search API. Here’s the kicker, and why I chose this as an example:

>> use Net::Google::Search;
>>>
>>> my $search = Net::Google::Search->new();
>>> $search->lr(qw(lang_en lang_fr lang_de));
lang_enlang_frlang_de
>>>

You put in an array, you get a string back out. Getting the input array back is going to require some serious split() magic. (Hint: don’t forget about the codes for Chinese, which don’t follow the lang_([a-z]{2}) format).

Now, there’s actually a reason — though not a good one — why the getter emits a string like this: the call to the SOAP API method apparently expects a string. However, if you need this kind of mutation inside your application, that’s where the joining/splitting/processing should be done: inside the application. The people using your library should never care about — and should never be forced to grapple with — implementation details like this.

Validate your input

For all of Net::Google’s hand-written accessor methods, only one does any kind of input validation. You end up with methods like this one, from Net::Google::Search:

sub starts_at {
  my $self = shift;
  my $at   = shift;

  if (defined($at)) {
    $self->{'_starts_at'} = $at;
  }

  return $self->{'_starts_at'};
}

starts_at() is defined as accept an integer, but in reality it doesn’t care. You could pass it a string, a coderef, a blessed regex, and it would be none the wiser. You’ll only find out about the error much later, when some other part of the application asks starts_at() for an integer, gets back an arrayref and freaks the hell out.

That’s why doing input validation in the accessors is important: so you know that the data is invalid (i.e., there’s a bug) as soon as possible. Doing validation in the accessor makes it trivial to track down where the bad data came from, or at least how it got to your code. The further you move validation down the call-chain, the more call sites you have to examine to find out how starts_at() came by this bad data. More call sites == more work == bad.

Now, I sympathize somewhat with Net::Google on this one. It’s hard to do input validation when the final arbiter of “valid input” is a web service that you don’t control. Every time the web service changes — either to accept previously-invalid data or vice-versa — you have to re-release. That sucks. However, there’s still some minimal validation you can do. For example, Net::Google::Search’s lr() accessor doesn’t have to know about every possible language restriction, but it can make sure all arguments are of the right type (here: strings). In other cases, you might not be able to say with 100% certainty that every element in an arrayref is valid or invalid, but you can at least figure out whether they’re all integers.

It’s these little things that go a long way.

As I mentioned, I’ve recently taken over ownership of a project at work that makes heavy use of Google’s SOAP APIs, using Net::Google to handle the SOAP stuff. A major part of my initial marching orders were to fix one example of Net::Google’s curious definition of “error handling”.

There’s a long-standing problem with Google’s search API that it throws “502 Bad Gateway” errors around 20% of the time. Net::Google’s attitude toward this (and all other errors) is to simply carp and carry on. From our standpoint, we want a clean way of being able to differentiate what the error was: was the user’s API key invalid? Have they reached their query limit? Did we get yet another 502? Overriding Perl’s warning mechanism does not qualify as clean or pretty or…really, any positive adjective, so I downloaded Net::Google’s latest release and started reading.

I have only recently stopped screaming in terror.

It soon became clear that my task was larger than “throw errors when Google’s SOAP server craps out”. Much larger. So large that I’m now putting together a 2.0.0 release of this package. In the spirit of learning from mistakes, let’s work through this poor module and look at ways some of this stuff could be done better:

Do not ignore errors

I don’t care what language you’re using, On error resume next is never a good idea. In Net::Google’s case, any errors at the SOAP layer are trapped and reissued as warnings, continuing on its merry way as if nothing had happened. This includes things like:

  • “502 Bad Gateway” - for when the SOAP service is down
  • “Invalid authorization key” - when the user had given us a bad API key
  • “Daily limit of 1000 queries exceeded” - try again tomorrow

I want to be able to trap these and respond differently to each scenario. If I get a 502 error, I might sleep for a few seconds, then try again. If I get an “invalid authorization key”, I should inform the user so they can give me a valid key. The point is that I want to be able to respond to odd — one might say “exceptional” — behavior. Having these messages emitted as warnings makes that a lot harder.

Do not leave useless code lying around

From Net::Google::Search:

use constant RESTRICT_ENCODING => qw [ arabic gb ... a dozen more ... cyrillic utf8 ];

use constant RESTRICT_LANGUAGES => qw [ ar zh-CN ... a dozen more ... sv tr ];

use constant RESTRICT_COUNTRIES => qw [ AD ... dozens and dozens more ... ZA ZM ZR ];

use constant RESTRICT_TOPICS => qw [ unclesam linux mac bsd ];

use constant WATCH => "__estimatedTotalResultsCount";

Guess how many of these were used in the following code? That’s right, zero. None of them. Some of these arrays had several dozen elements, just taking up space to no effect. Several other submodules had similar sections, uselessly defining constants that would never be used in the code.

Worse, they took up brain cycles and time while I tried to figure out a) what they were for, then b) if I could safely remove them without breaking anything.

Do not modify constants

*sigh*. You’d think it would be obvious; these things are called constants for a reason. You don’t change them. You can’t change them. Ever. Yet what do I find in Net::Google::Service

use constant SERVICE_CACHE => {};

followed a little later by

&SERVICE_CACHE->{$service} = "$dir/Net/Google/Services/".&SERVICES->{$service};

That’s right: modifying a constant. What makes this even better The Daily WTF fodder is the comment above this line, complaining about how using the & sigil is a work-around for a bug in perl 5.00502. In the spirit of the old “It hurts when I do this”-”Don’t do that” joke, here’s my advice about this whole mess: don’t do it. I have trouble imagining the thought process that concluded, “You know, a hash just won’t do”.

“Invalid input” does not mean “guess”

There’s an old saying ’round these parts, “in the face of ambiguity, refuse the temptation to guess“. When it comes to programming, you can swap out “ambiguity” for any number of things and generate valid maxims all day long. In this situation, I’ll take “an error”: in the face of an error, refuse the temptation to guess. Here’s one such offender, taken from Net::Google::Search:

if (int($max) < 1) {
    carp "'$max' must be a int greater than 0";
    $max = 1;
}

Bonus points: notice that the warnings produced look like “‘-5′ must be an int greater than 0″. a) the quote marks around -5 make it look like a string, and b) a much more helpful message would be “max_results must be an integer greater than 0″, i.e., don’t just tell me how I screwed up, tell me what I screwed up, too.

Documenting it doesn’t make it so

Unmodified:

=head2 $obj->starts_at($at)

Returns an int. Default is 0.

Returns undef if there was an error.

=cut

sub starts_at {
  my $self = shift;
  my $at   = shift;

  if (defined($at)) {
    $self->{'_starts_at'} = $at;
  }

  return $self->{'_starts_at'};
}

Can someone please explain to me how this method “[r]eturns undef if there was an error”? No? I can’t either. Nor can I figure out what kind of errors could possibly arise in this code.

If you want a piece of code to do X, it helps to actually make it do X. If your docs say one thing and the code does another, the code is generally going to prevail when push comes to shove. It’s important to make sure your code does what you want it to; that’s where tests come in. Maybe this accessor method at one time had some kind of input validation and would in fact return undef if that validation failed, and just maybe that code accidentally got wiped out at some point. A test suite would let you know that.

My, what a nice segue for the next point..

Good tests are your friends

As shipped, Net::Google 1.0.1 has…let me count them…8 tests (12 if you include duplicates). And those 8 aren’t exactly worth much. Here’s one from t/002-cache.t:

my $search = $google->search();
isa_ok($search, 'Net::Google::Search');

$search->query(QUERY);
$search->max_results(MAXRESULTS);
$search->filter(1);

my $results = $search->results();
is(ref($results), 'ARRAY', 'Got results for '.QUERY);

That’s the full extent of the testing for the interface to Google’s search API. Is $results tested to make sure it has elements? Why waste time! Are the elements of $results tested to make sure they’re filled out? Poppycock! Do we ever make sure that the accessors on $search work correctly? Hells no!

I know testing isn’t fun and isn’t glamorous. No-one ever got laid or got their picture in the paper for working on a test suite. That doesn’t mean you shouldn’t do it, though, ’cause you know what else doesn’t get you laid? Writing poorly-specified, untested software that’s incredibly difficult for other people to modify because they have no idea what it was supposed to do in the first place. Call it karma.

“Method X exists” is not documentation

One last documentation gem before we go:

=head2 $obj->filter($bool)

Returns true or false. Returns undef if there was an error.

=cut

Thank $diety the docs don’t tell me what this property controls or does; it’s not like I may need to use it. Dear reader: if you want me to use your software, please tell me what it does. If I have to read the code to figure out what an accessor method controls, you lose; I’ll use someone else’s package. I just don’t care that much.


Next time we’ll go over what Net::Google teaches us about accessor methods. There was so much material here, it took up its own post.