November 2006
Monthly Archive
Wed 29 Nov 2006
In our last episode, I talked about how not to write Perl, drawing cases of poor design from Net::Google. Let’s play that game again, this time focusing on the do-s and don’t-s of accessor methods.
Don’t write your own accessors
Every time you write a method like this, an angel dies:
sub safe {
my $self = shift;
my $bool = shift;
if (defined($bool)) {
$self->{'_safe'} = ($bool) ? 1 : 0;
}
return $self->{'_safe'};
}
Seriously, there are 20 billion libraries on CPAN that are specifically designed to do this kind of stuff for you. To name only a handful:
Some of these are better than others, some I would recommend (Moose, Class::BuildMethods), some I wouldn’t (Class::MethodMaker), but the point is this: this is Perl, we’re supposed to be lazy. Writing your own accessors is not lazy. Computers generally — and reusable modules in particular — are supposed to free us from this kind of busy work. CPAN is your friend; stupid crap like hand-writing formulaic accessor methods is not.
Speaking of things that aren’t your friend…
AUTOLOAD is not an accessor system
Two of Net::Google’s sub-modules, Net::Google::Response and Result, eschew hand-written accessors in favor of something arguably worse: AUTOLOAD. Rather than writing out each accessor, you get this:
use constant RESPONSE_FIELDS => qw [
directoryCategories estimateIsExact startIndex
searchTime estimatedTotalResultsCount searchTips
searchComments searchQuery endIndex documentFiltering ];
sub AUTOLOAD {
my $self = shift;
$AUTOLOAD =~ s/.*:://;
unless (grep/^($AUTOLOAD)$/,&RESPONSE_FIELDS) {
carp 'Unknown attribute : '.$AUTOLOAD;
return undef;
}
return $self->{'__'.$AUTOLOAD};
}
Ignoring the O(n) array-based test to see if an accessor is valid, this approach seems like a good idea at first glance: you save yourself from writing and maintaining a bunch of copy-and-pasted, formulaic accessor methods. And that’s true, it does; it’s a win in that regard. But here’s what you lose:
-
Half the point of using accessor methods instead of just using a hashref (the way things used to be done in the Perl world) is the compile-time safety it buys you. In the hashref-based object implementation, a typo in the name of an attribute is silently ignored; after all, the compiler doesn’t know that you mistyped something, it just thinks you’re, e.g., assigning to a new key in the hash. Using methods allows perl to do some amount of typo checking for you: now, a typo in an attribute name results in a compile-time error.
Using AUTOLOAD takes us right back to the hash-based implementation. Because the difference between a valid method call and a typo is hidden away, tucked inside AUTOLOAD, what could have been compile-time errors (using real accessor methods) are pushed back to runtime errors. You’re not using the compiler to your full advantage.
-
Using AUTOLOAD means shooting introspection and reflection all to hell. It doesn’t matter that your AUTOLOAD function says x() is a valid method call; if you ask Perl whether or not an object can() x(), you’ll get back “no” every time. It doesn’t matter that your AUTOLOAD function allows calls of the pattern set_(\w+)(); if you try to iterate over every method in a given class, those methods will never, ever show up.
Congratulations. Your classes are now less useful to other developers. Pat yourself on the back.
This doesn’t even touch on how using AUTOLOAD mucks up your class inheritance. For more about that, this post by Michael G. Schwern is a good starting point.
Accessors should not lose information
What I put into a getter method, I should get back out from the setter. Some mutation or modification of the input is acceptable, but not like this:
sub lr {
my $self = shift;
my @lang = @_;
if ((scalar(@lang) > 1) && ($lang[0] eq '')) {
$self->{'_lr'} = [];
shift @lang;
}
if (@lang) {
push @{$self->{'_lr'}},@lang;
}
return join('',@{$self->{'_lr'}});
}
This method sets the language restrictions for a call to Google’s search API. Here’s the kicker, and why I chose this as an example:
>> use Net::Google::Search;
>>>
>>> my $search = Net::Google::Search->new();
>>> $search->lr(qw(lang_en lang_fr lang_de));
lang_enlang_frlang_de
>>>
You put in an array, you get a string back out. Getting the input array back is going to require some serious split() magic. (Hint: don’t forget about the codes for Chinese, which don’t follow the lang_([a-z]{2}) format).
Now, there’s actually a reason — though not a good one — why the getter emits a string like this: the call to the SOAP API method apparently expects a string. However, if you need this kind of mutation inside your application, that’s where the joining/splitting/processing should be done: inside the application. The people using your library should never care about — and should never be forced to grapple with — implementation details like this.
Validate your input
For all of Net::Google’s hand-written accessor methods, only one does any kind of input validation. You end up with methods like this one, from Net::Google::Search:
sub starts_at {
my $self = shift;
my $at = shift;
if (defined($at)) {
$self->{'_starts_at'} = $at;
}
return $self->{'_starts_at'};
}
starts_at() is defined as accept an integer, but in reality it doesn’t care. You could pass it a string, a coderef, a blessed regex, and it would be none the wiser. You’ll only find out about the error much later, when some other part of the application asks starts_at() for an integer, gets back an arrayref and freaks the hell out.
That’s why doing input validation in the accessors is important: so you know that the data is invalid (i.e., there’s a bug) as soon as possible. Doing validation in the accessor makes it trivial to track down where the bad data came from, or at least how it got to your code. The further you move validation down the call-chain, the more call sites you have to examine to find out how starts_at() came by this bad data. More call sites == more work == bad.
Now, I sympathize somewhat with Net::Google on this one. It’s hard to do input validation when the final arbiter of “valid input” is a web service that you don’t control. Every time the web service changes — either to accept previously-invalid data or vice-versa — you have to re-release. That sucks. However, there’s still some minimal validation you can do. For example, Net::Google::Search’s lr() accessor doesn’t have to know about every possible language restriction, but it can make sure all arguments are of the right type (here: strings). In other cases, you might not be able to say with 100% certainty that every element in an arrayref is valid or invalid, but you can at least figure out whether they’re all integers.
It’s these little things that go a long way.
Mon 20 Nov 2006
As I mentioned, I’ve recently taken over ownership of a project at work that makes heavy use of Google’s SOAP APIs, using Net::Google to handle the SOAP stuff. A major part of my initial marching orders were to fix one example of Net::Google’s curious definition of “error handling”.
There’s a long-standing problem with Google’s search API that it throws “502 Bad Gateway” errors around 20% of the time. Net::Google’s attitude toward this (and all other errors) is to simply carp and carry on. From our standpoint, we want a clean way of being able to differentiate what the error was: was the user’s API key invalid? Have they reached their query limit? Did we get yet another 502? Overriding Perl’s warning mechanism does not qualify as clean or pretty or…really, any positive adjective, so I downloaded Net::Google’s latest release and started reading.
I have only recently stopped screaming in terror.
It soon became clear that my task was larger than “throw errors when Google’s SOAP server craps out”. Much larger. So large that I’m now putting together a 2.0.0 release of this package. In the spirit of learning from mistakes, let’s work through this poor module and look at ways some of this stuff could be done better:
Do not ignore errors
I don’t care what language you’re using, On error resume next is never a good idea. In Net::Google’s case, any errors at the SOAP layer are trapped and reissued as warnings, continuing on its merry way as if nothing had happened. This includes things like:
- “502 Bad Gateway” - for when the SOAP service is down
- “Invalid authorization key” - when the user had given us a bad API key
- “Daily limit of 1000 queries exceeded” - try again tomorrow
I want to be able to trap these and respond differently to each scenario. If I get a 502 error, I might sleep for a few seconds, then try again. If I get an “invalid authorization key”, I should inform the user so they can give me a valid key. The point is that I want to be able to respond to odd — one might say “exceptional” — behavior. Having these messages emitted as warnings makes that a lot harder.
Do not leave useless code lying around
From Net::Google::Search:
use constant RESTRICT_ENCODING => qw [ arabic gb ... a dozen more ... cyrillic utf8 ];
use constant RESTRICT_LANGUAGES => qw [ ar zh-CN ... a dozen more ... sv tr ];
use constant RESTRICT_COUNTRIES => qw [ AD ... dozens and dozens more ... ZA ZM ZR ];
use constant RESTRICT_TOPICS => qw [ unclesam linux mac bsd ];
use constant WATCH => "__estimatedTotalResultsCount";
Guess how many of these were used in the following code? That’s right, zero. None of them. Some of these arrays had several dozen elements, just taking up space to no effect. Several other submodules had similar sections, uselessly defining constants that would never be used in the code.
Worse, they took up brain cycles and time while I tried to figure out a) what they were for, then b) if I could safely remove them without breaking anything.
Do not modify constants
*sigh*. You’d think it would be obvious; these things are called constants for a reason. You don’t change them. You can’t change them. Ever. Yet what do I find in Net::Google::Service…
use constant SERVICE_CACHE => {};
followed a little later by
&SERVICE_CACHE->{$service} = "$dir/Net/Google/Services/".&SERVICES->{$service};
That’s right: modifying a constant. What makes this even better The Daily WTF fodder is the comment above this line, complaining about how using the & sigil is a work-around for a bug in perl 5.00502. In the spirit of the old “It hurts when I do this”-”Don’t do that” joke, here’s my advice about this whole mess: don’t do it. I have trouble imagining the thought process that concluded, “You know, a hash just won’t do”.
“Invalid input” does not mean “guess”
There’s an old saying ’round these parts, “in the face of ambiguity, refuse the temptation to guess“. When it comes to programming, you can swap out “ambiguity” for any number of things and generate valid maxims all day long. In this situation, I’ll take “an error”: in the face of an error, refuse the temptation to guess. Here’s one such offender, taken from Net::Google::Search:
if (int($max) < 1) {
carp "'$max' must be a int greater than 0";
$max = 1;
}
Bonus points: notice that the warnings produced look like “‘-5′ must be an int greater than 0″. a) the quote marks around -5 make it look like a string, and b) a much more helpful message would be “max_results must be an integer greater than 0″, i.e., don’t just tell me how I screwed up, tell me what I screwed up, too.
Documenting it doesn’t make it so
Unmodified:
=head2 $obj->starts_at($at)
Returns an int. Default is 0.
Returns undef if there was an error.
=cut
sub starts_at {
my $self = shift;
my $at = shift;
if (defined($at)) {
$self->{'_starts_at'} = $at;
}
return $self->{'_starts_at'};
}
Can someone please explain to me how this method “[r]eturns undef if there was an error”? No? I can’t either. Nor can I figure out what kind of errors could possibly arise in this code.
If you want a piece of code to do X, it helps to actually make it do X. If your docs say one thing and the code does another, the code is generally going to prevail when push comes to shove. It’s important to make sure your code does what you want it to; that’s where tests come in. Maybe this accessor method at one time had some kind of input validation and would in fact return undef if that validation failed, and just maybe that code accidentally got wiped out at some point. A test suite would let you know that.
My, what a nice segue for the next point..
Good tests are your friends
As shipped, Net::Google 1.0.1 has…let me count them…8 tests (12 if you include duplicates). And those 8 aren’t exactly worth much. Here’s one from t/002-cache.t:
my $search = $google->search();
isa_ok($search, 'Net::Google::Search');
$search->query(QUERY);
$search->max_results(MAXRESULTS);
$search->filter(1);
my $results = $search->results();
is(ref($results), 'ARRAY', 'Got results for '.QUERY);
That’s the full extent of the testing for the interface to Google’s search API. Is $results tested to make sure it has elements? Why waste time! Are the elements of $results tested to make sure they’re filled out? Poppycock! Do we ever make sure that the accessors on $search work correctly? Hells no!
I know testing isn’t fun and isn’t glamorous. No-one ever got laid or got their picture in the paper for working on a test suite. That doesn’t mean you shouldn’t do it, though, ’cause you know what else doesn’t get you laid? Writing poorly-specified, untested software that’s incredibly difficult for other people to modify because they have no idea what it was supposed to do in the first place. Call it karma.
“Method X exists” is not documentation
One last documentation gem before we go:
=head2 $obj->filter($bool)
Returns true or false. Returns undef if there was an error.
=cut
Thank $diety the docs don’t tell me what this property controls or does; it’s not like I may need to use it. Dear reader: if you want me to use your software, please tell me what it does. If I have to read the code to figure out what an accessor method controls, you lose; I’ll use someone else’s package. I just don’t care that much.
Next time we’ll go over what Net::Google teaches us about accessor methods. There was so much material here, it took up its own post.
Fri 17 Nov 2006
Since I’m forever reinventing stuff that SVK already does, I thought I’d ask before coding up this idea:
A while ago, I got tired of reading through commit messages and not know in what directory or branch they occurred in. Trying to make things a bit easier, I started prefixing each commit message with the path the change is occurring in. For example, if I’m changing these files
trunk/lib/Foo/Bar/Baz.pm
trunk/lib/Foo/Bar.pm
trunk/lib/Foo/Qux.pm
I’d use trunk/lib/Foo/ as the path.
I’ve now gotten pretty tired of having to type this in for every commit, and I’m looking for a better way. Ideally, there would be a way of specifying — either per-user or system wide — that SVK should filter all commits through a given module. That module would be passed a list of all files involved in the commit, the draft commit message, etc., and it could then generate a new commit message. I would use this to have SVK figure out my little path scheme, saving my fingers from having to do that extra typing. Someone else who might benefit from making it easier to generate strictly-formatted commit messages: Subversion.
Why not do this on the repository side with a commit-hook, you ask? Easy: you can’t. Read the section in the SVN manual on hook scripts and scan for the big red warning about trying to use hook scripts to modify a transaction.
I’m pretty sure that SVK doesn’t already have a facility like this in place, but I’ve thought that before, as my pile of SVK gadgets will attest. So I’m asking this time.
Thu 16 Nov 2006
I’ve started working on a functional programming cookbook to illustrate practical uses for my functional package. So far I’ve got three examples up, with more on the way:
-
Since functional’s shipped version of compose() only allows you to compose two functions, the cookbook includes a version of compose() that accepts multiple functions.
-
I tend to use a heavily-functional style when writing __hash__() implementations, so I’ve included an example of how to do that using functional.
-
Last but not least is a join() function that works like Python’s "".join(...) idiom, except that it automatically stringifies its arguments.
I’ve got some more examples in the works, including an explanation of when you should use the built-in reduce() over foldl() and vice-versa. Keep watching the cookbook for these and future Python functional programming goodies.
Wed 15 Nov 2006
When I posted the first version of svk-init, my handy-dandy SVK initialisation script, I mentioned that I wanted to have the script figure out the short name for me, based on the repository URL. Promises made, promises kept:
#!/usr/bin/perl
use warnings;
use strict;
my($source, $short) = @ARGV;
unless(defined $short)
{
if($source =~ /^[^:]+://(.+)$/)
{
my @path = split('/', $1);
for(my $i = 1; $i < @path; $i++)
{
if($path[$i] eq 'branches' || $path[$i] eq 'tags' || $path[$i] eq 'trunk')
{
$short = $path[$i-1];
last;
}
}
# Fallback: use the last part of the path
$short ||= $path[-1];
}
}
unless(defined $short)
{
die('You need to provide a short name for the repository');
}
system("svk mirror $source //$short/main");
system("svk sync //$short/main");
system("svk cp //$short/main //$short/local "
."-m 'Creating //$short/local'");
system("svk co //$short/local ~/src/$short");
This allows me to give svk-init a repository URL of
http://svn.python.org/projects/python/trunk/
and have svk-init correctly come up with “python” as the short name.
Relatedly, SVK’s author, Chia-liang Kao mentioned in the comments that the first version of svk-init could be compressed to svk cp $source ~/src/$short, plus answering a few questions. I tried it out, and sure enough, it does indeed do the job. I’m going to stick with svk-init for two reasons: 1) svk cp uses a different mirror naming scheme, and 2) svk cp requires you to answer 3-4 questions, meaning I can’t start it up and forget it about it like I do svk-init.
Tue 14 Nov 2006
I’ve recently taken over ownership of a new project at work, a suite of web-based tools for search engine optimization. In the course of patching Net:Google, the module we use for talking to Google’s SOAP API I found myself staring down a number of classes, some with AUTOLOAD-based accessor systems, some with accessors based on the tried-and-true Class::Accessor::CopyNPaste method. In neither case was I pleased. I needed a real accessor generator.
A CPAN search turned up Class::BuildMethods and Class::MethodMaker as promising options. My quick review follows:
-
Both share the same declarative style: “give me methods x, y and z, with these defaults, using these functions for validation”.
From Class::MethodMaker:
package Foo::Bar;
use Class::MethodMaker
[ scalar => [{default => 10}, 'x'],
scalar => [{default => 6}, 'y'],
];
From Class::BuildMethods:
package Foo::Bar;
use Class::BuildMethods
x => {default => 10},
y => {default => 6};
-
Where Class::BuildMethods can only generate accessors for scalars (requiring arrays and hashes to be passed in by reference), Class::MethodMaker can generate special accessors for array and hash properties:
use Class::MethodMaker
[ array => [qw(x y)],
hash => 'z',
];
There’s a downside to this, though. When dealing with array accessors (and possibly hash accessors, too; I haven’t tested them), Class::MethodMaker gets too clever for my tastes: in scalar context, an array getter will return an arrayref, while in array context you get an array. The docs don’t make any mention this (that I was able to find), and I was seriously confused as to why
is_deeply($self->x, [], 'Correct default (x)')
kept complaining that an arrayref was not equal to “Correct default (x)”.
-
Class::MethodMaker offers considerably more flexibility and power than does Class::BuildMethods. While BuildMethods only allows you to validate the input to an accessor (i.e., accept or die), MethodMaker permits more general input-processing capabilities.
A feature I found more useful is MethodMaker’s ability to generate a proper new() method for you. With this in hand, I was able to define an entire class like so:
package Result;
use Class::MethodMaker
[ scalar => [qw/title URL snippet cachedSize/],
scalar => [qw/hostName directoryCategory/],
new => 'new'
];
-
Both modules (as of this writing) come with the same bug: neither allow you to give an accessor a default value of 0 or “”. While tracking down the cause of the bug in Class::BuildMethods was trivial, and a bug report has already been filed, fixing Class::MethodMaker, on the other hand,…
-
While Ovid’s code in Class::BuildMethods is very clear and easy to understand, Class::MethodMaker’s author seems to have picked the most complicated implementation possible. This is serious hate-the-end-user stuff. In what appears to be an effort to avoid writing any piece of code more than once, Class::MethodMaker’s internals are built-up by running several files through a custom-written preprocessor that makes absolutely no sense to anyone but its author and comes with zero documentation. It took me about 45 seconds to fix BuildMethods; after at least 30 minutes spent delving about in the core of Class::MethodMaker, I’m still no closer to coming up with a patch.
Conclusion: Class::MethodMaker comes across as overly powerful and complicated, with way more features, nooks and crannies than I’ll ever need, and an implementation that makes the Baby Jesus — not to mention my old CSCI 101 professor — cry. Class::BuildMethods is simple, elegant and gets the job done, though the lack of a new() emitter is a tick mark against it. Verdict: I’m going with Class::BuildMethods for now, plus submitting a patch for a new() method generator.
Wed 8 Nov 2006
Tired of typing the same formulaic SVK commands over and over again whenever I need to mirror another repository to my system (especially in the wake of several successive laptop failures), I whipped up this little bash script. I have christened it svk-init.
#!/bin/sh
source=$1;
short=$2;
svk mirror $source //$short/main
svk sync //$short/main
svk cp //$short/main //$short/local -m "Creating //$short/local"
svk co //$short/local ~/src/$short
The first parameter is the URL to mirror from, the second is a short name for the project. Usage goes something like this:
$ ./svk-init svn+ssh://somehost/var/svn/functional functional
When the script completes, I’ll be left with a working copy in my ~/src/functional/ directory.
Eventually, I’d like to make the short name optional, having the script deduce one for me if the second parameter is left unspecified.