Wed 29 Nov 2006
Lessons from Net::Google - All accessors edition
Posted by Collin under best practices , net::google , perlIn our last episode, I talked about how not to write Perl, drawing cases of poor design from Net::Google. Let’s play that game again, this time focusing on the do-s and don’t-s of accessor methods.
Don’t write your own accessors
Every time you write a method like this, an angel dies:
sub safe {
my $self = shift;
my $bool = shift;
if (defined($bool)) {
$self->{'_safe'} = ($bool) ? 1 : 0;
}
return $self->{'_safe'};
}
Seriously, there are 20 billion libraries on CPAN that are specifically designed to do this kind of stuff for you. To name only a handful:
- Class::Accessor
- Class::Accessor::Class
- Class::BuildMethods
- Class::Data::Accessor
- Class::MakeMethods
- Class::Meta::AccessorBuilder::Affordance
- Class::MethodMaker
- Moose
- Object::Accessor
Some of these are better than others, some I would recommend (Moose, Class::BuildMethods), some I wouldn’t (Class::MethodMaker), but the point is this: this is Perl, we’re supposed to be lazy. Writing your own accessors is not lazy. Computers generally — and reusable modules in particular — are supposed to free us from this kind of busy work. CPAN is your friend; stupid crap like hand-writing formulaic accessor methods is not.
Speaking of things that aren’t your friend…
AUTOLOAD is not an accessor system
Two of Net::Google’s sub-modules, Net::Google::Response and Result, eschew hand-written accessors in favor of something arguably worse: AUTOLOAD. Rather than writing out each accessor, you get this:
use constant RESPONSE_FIELDS => qw [
directoryCategories estimateIsExact startIndex
searchTime estimatedTotalResultsCount searchTips
searchComments searchQuery endIndex documentFiltering ];
sub AUTOLOAD {
my $self = shift;
$AUTOLOAD =~ s/.*:://;
unless (grep/^($AUTOLOAD)$/,&RESPONSE_FIELDS) {
carp 'Unknown attribute : '.$AUTOLOAD;
return undef;
}
return $self->{'__'.$AUTOLOAD};
}
Ignoring the O(n) array-based test to see if an accessor is valid, this approach seems like a good idea at first glance: you save yourself from writing and maintaining a bunch of copy-and-pasted, formulaic accessor methods. And that’s true, it does; it’s a win in that regard. But here’s what you lose:
-
Half the point of using accessor methods instead of just using a hashref (the way things used to be done in the Perl world) is the compile-time safety it buys you. In the hashref-based object implementation, a typo in the name of an attribute is silently ignored; after all, the compiler doesn’t know that you mistyped something, it just thinks you’re, e.g., assigning to a new key in the hash. Using methods allows perl to do some amount of typo checking for you: now, a typo in an attribute name results in a compile-time error.
Using AUTOLOAD takes us right back to the hash-based implementation. Because the difference between a valid method call and a typo is hidden away, tucked inside AUTOLOAD, what could have been compile-time errors (using real accessor methods) are pushed back to runtime errors. You’re not using the compiler to your full advantage.
-
Using AUTOLOAD means shooting introspection and reflection all to hell. It doesn’t matter that your AUTOLOAD function says
x()is a valid method call; if you ask Perl whether or not an objectcan()x(), you’ll get back “no” every time. It doesn’t matter that your AUTOLOAD function allows calls of the patternset_(\w+)(); if you try to iterate over every method in a given class, those methods will never, ever show up.Congratulations. Your classes are now less useful to other developers. Pat yourself on the back.
This doesn’t even touch on how using AUTOLOAD mucks up your class inheritance. For more about that, this post by Michael G. Schwern is a good starting point.
Accessors should not lose information
What I put into a getter method, I should get back out from the setter. Some mutation or modification of the input is acceptable, but not like this:
sub lr {
my $self = shift;
my @lang = @_;
if ((scalar(@lang) > 1) && ($lang[0] eq '')) {
$self->{'_lr'} = [];
shift @lang;
}
if (@lang) {
push @{$self->{'_lr'}},@lang;
}
return join('',@{$self->{'_lr'}});
}
This method sets the language restrictions for a call to Google’s search API. Here’s the kicker, and why I chose this as an example:
>> use Net::Google::Search; >>> >>> my $search = Net::Google::Search->new(); >>> $search->lr(qw(lang_en lang_fr lang_de)); lang_enlang_frlang_de >>>
You put in an array, you get a string back out. Getting the input array back is going to require some serious split() magic. (Hint: don’t forget about the codes for Chinese, which don’t follow the lang_([a-z]{2}) format).
Now, there’s actually a reason — though not a good one — why the getter emits a string like this: the call to the SOAP API method apparently expects a string. However, if you need this kind of mutation inside your application, that’s where the joining/splitting/processing should be done: inside the application. The people using your library should never care about — and should never be forced to grapple with — implementation details like this.
Validate your input
For all of Net::Google’s hand-written accessor methods, only one does any kind of input validation. You end up with methods like this one, from Net::Google::Search:
sub starts_at {
my $self = shift;
my $at = shift;
if (defined($at)) {
$self->{'_starts_at'} = $at;
}
return $self->{'_starts_at'};
}
starts_at() is defined as accept an integer, but in reality it doesn’t care. You could pass it a string, a coderef, a blessed regex, and it would be none the wiser. You’ll only find out about the error much later, when some other part of the application asks starts_at() for an integer, gets back an arrayref and freaks the hell out.
That’s why doing input validation in the accessors is important: so you know that the data is invalid (i.e., there’s a bug) as soon as possible. Doing validation in the accessor makes it trivial to track down where the bad data came from, or at least how it got to your code. The further you move validation down the call-chain, the more call sites you have to examine to find out how starts_at() came by this bad data. More call sites == more work == bad.
Now, I sympathize somewhat with Net::Google on this one. It’s hard to do input validation when the final arbiter of “valid input” is a web service that you don’t control. Every time the web service changes — either to accept previously-invalid data or vice-versa — you have to re-release. That sucks. However, there’s still some minimal validation you can do. For example, Net::Google::Search’s lr() accessor doesn’t have to know about every possible language restriction, but it can make sure all arguments are of the right type (here: strings). In other cases, you might not be able to say with 100% certainty that every element in an arrayref is valid or invalid, but you can at least figure out whether they’re all integers.
It’s these little things that go a long way.
November 30th, 2006 at 20:14
I agree that using AUTOLOAD() for accessors is a bad idea, but there’s a simple solution to avoid breaking can() — provide your own can().