perl


So Tyler says to me, he says:

I went to the Nashville PHP group last night. The conversation turned to which languages are on the rise, and I threw Python into the mix. Problem is, I had very little ammo to arm myself with. Got a list of bullet points as to why Python is better?

Well, yes and no.

In terms of functionality, there’s very little difference between Perl 5, Python, PHP and Ruby. The reasons to choose one over the other are typically very domain-specific (hence subtle and of little use when fighting religious wars): Perl 5 makes text munging simple by having, e.g., regular expressions as first-class citizens; PHP makes web applications more natural because, well, that’s what it was designed to do.

I have nothing really positive (or negative) to say about Ruby. I can’t think of any special niche that it fills. Anonymous blocks? Perl 5 has them. Pure OO? Python has it. call/cc? If you think you need continuations, you probably don’t. You could argue that Ruby serves a purpose by combining all these things, but the number of people who sincerely need a pure OO language with anonymous blocks and continuations is probably around five.

The negative things I can think of with respect to Perl 5 and PHP is that it’s hard to do dependency injection-based testing in these languages. It’s so hard in Java, for example, even Google has invented a tool to make Java DI easier. Python on the other hand makes this dead-simple, making it so much easier to test your code from all perspectives. Hell, it’s so easy in Python, I didn’t even know there was a name for it until I came to Google. I don’t know how easy DI is in Ruby, but if it’s not Python-easy, Ruby loses.

That’s one criterion for programming languages that I don’t see discussed much: ranking languages by how easy the code is to test. One frequent example is mocking a global resource like a time source. C, C++ and Java all require you to come up with unnatural function signatures or link against special libraries when testing in order to gain control over time. It’s easier in Perl 5, but it still requires a good deal of specialized knowledge of how namespaces and module lookups work. Assuming the target library does something like import time at the top, here’s how you take control of a given module’s time source in Python:

>>> import some_module
>>> class StubTime:
>>>    def time(self):
>>>        return 3634634
>>> some_module.time = StubTime()

Done. No specialized knowledge of interpreter details, no crazy setup, just done. If mocking global resources isn’t that easy in PHP, Ruby or any other language, I have little use for it beyond toy projects. Testing is where I feel Python really stands out.

On Monday, Google announced that they were effectively end-of-lifing their SOAP API, back-dated to 5. December. “Announce” isn’t quite the right word, but English doesn’t have a verb for “posted a notice on the API’s site without any heads-up to developers”.

Since several of our applications at work depend on that SOAP API, this threw a bit of a wrench into our day. As a work-around, I spent yesterday coding up a SOAP server that allows applications that use the Google API to continue working. The tool acts as a gateway, translating SOAP calls into requests to Google’s web interface, scraping the result pages and packaging the search results back up into SOAP. It supports all doGetCachedPage(), doSpellingSuggestion() and doGoogleSearch() calls, though search results lack some of the data that the SOAP API provided. I used Net::Google for testing the gateway, and as far as I can tell everything works fine.

Most of my development time was spent wrestling first with SOAP::Lite, trying to figure out exactly what incantations I needed to get a bare-bones SOAP server working, then with the perl regexes needed to scrape the result pages (grrr…special cases…grrr). In the end, we had a releasable product within 24 hours of reading about the discontinuation of the API, so I’m pretty happy with myself, especially since I’d never done server-side SOAP stuff before.

We’ve released the current 0.1 version of code, with plans for a faster 0.2 soon.

In our last episode, I talked about how not to write Perl, drawing cases of poor design from Net::Google. Let’s play that game again, this time focusing on the do-s and don’t-s of accessor methods.

Don’t write your own accessors

Every time you write a method like this, an angel dies:

sub safe {
  my $self = shift;
  my $bool = shift;

  if (defined($bool)) {
    $self->{'_safe'} = ($bool) ? 1 : 0;
  }

  return $self->{'_safe'};
}

Seriously, there are 20 billion libraries on CPAN that are specifically designed to do this kind of stuff for you. To name only a handful:

Some of these are better than others, some I would recommend (Moose, Class::BuildMethods), some I wouldn’t (Class::MethodMaker), but the point is this: this is Perl, we’re supposed to be lazy. Writing your own accessors is not lazy. Computers generally — and reusable modules in particular — are supposed to free us from this kind of busy work. CPAN is your friend; stupid crap like hand-writing formulaic accessor methods is not.

Speaking of things that aren’t your friend…

AUTOLOAD is not an accessor system

Two of Net::Google’s sub-modules, Net::Google::Response and Result, eschew hand-written accessors in favor of something arguably worse: AUTOLOAD. Rather than writing out each accessor, you get this:

use constant RESPONSE_FIELDS => qw [
directoryCategories estimateIsExact startIndex
searchTime estimatedTotalResultsCount searchTips
searchComments searchQuery endIndex documentFiltering ];

sub AUTOLOAD {
  my $self = shift;

  $AUTOLOAD =~ s/.*:://;

  unless (grep/^($AUTOLOAD)$/,&RESPONSE_FIELDS) {
    carp 'Unknown attribute : '.$AUTOLOAD;
    return undef;
  }

  return $self->{'__'.$AUTOLOAD};
}

Ignoring the O(n) array-based test to see if an accessor is valid, this approach seems like a good idea at first glance: you save yourself from writing and maintaining a bunch of copy-and-pasted, formulaic accessor methods. And that’s true, it does; it’s a win in that regard. But here’s what you lose:

  • Half the point of using accessor methods instead of just using a hashref (the way things used to be done in the Perl world) is the compile-time safety it buys you. In the hashref-based object implementation, a typo in the name of an attribute is silently ignored; after all, the compiler doesn’t know that you mistyped something, it just thinks you’re, e.g., assigning to a new key in the hash. Using methods allows perl to do some amount of typo checking for you: now, a typo in an attribute name results in a compile-time error.

    Using AUTOLOAD takes us right back to the hash-based implementation. Because the difference between a valid method call and a typo is hidden away, tucked inside AUTOLOAD, what could have been compile-time errors (using real accessor methods) are pushed back to runtime errors. You’re not using the compiler to your full advantage.

  • Using AUTOLOAD means shooting introspection and reflection all to hell. It doesn’t matter that your AUTOLOAD function says x() is a valid method call; if you ask Perl whether or not an object can() x(), you’ll get back “no” every time. It doesn’t matter that your AUTOLOAD function allows calls of the pattern set_(\w+)(); if you try to iterate over every method in a given class, those methods will never, ever show up.

    Congratulations. Your classes are now less useful to other developers. Pat yourself on the back.

This doesn’t even touch on how using AUTOLOAD mucks up your class inheritance. For more about that, this post by Michael G. Schwern is a good starting point.

Accessors should not lose information

What I put into a getter method, I should get back out from the setter. Some mutation or modification of the input is acceptable, but not like this:

sub lr {
  my $self = shift;
  my @lang = @_;

  if ((scalar(@lang) > 1) && ($lang[0] eq '')) {
    $self->{'_lr'} = [];
    shift @lang;
  } 

  if (@lang) {
    push @{$self->{'_lr'}},@lang;
  }

  return join('',@{$self->{'_lr'}});
}

This method sets the language restrictions for a call to Google’s search API. Here’s the kicker, and why I chose this as an example:

>> use Net::Google::Search;
>>>
>>> my $search = Net::Google::Search->new();
>>> $search->lr(qw(lang_en lang_fr lang_de));
lang_enlang_frlang_de
>>>

You put in an array, you get a string back out. Getting the input array back is going to require some serious split() magic. (Hint: don’t forget about the codes for Chinese, which don’t follow the lang_([a-z]{2}) format).

Now, there’s actually a reason — though not a good one — why the getter emits a string like this: the call to the SOAP API method apparently expects a string. However, if you need this kind of mutation inside your application, that’s where the joining/splitting/processing should be done: inside the application. The people using your library should never care about — and should never be forced to grapple with — implementation details like this.

Validate your input

For all of Net::Google’s hand-written accessor methods, only one does any kind of input validation. You end up with methods like this one, from Net::Google::Search:

sub starts_at {
  my $self = shift;
  my $at   = shift;

  if (defined($at)) {
    $self->{'_starts_at'} = $at;
  }

  return $self->{'_starts_at'};
}

starts_at() is defined as accept an integer, but in reality it doesn’t care. You could pass it a string, a coderef, a blessed regex, and it would be none the wiser. You’ll only find out about the error much later, when some other part of the application asks starts_at() for an integer, gets back an arrayref and freaks the hell out.

That’s why doing input validation in the accessors is important: so you know that the data is invalid (i.e., there’s a bug) as soon as possible. Doing validation in the accessor makes it trivial to track down where the bad data came from, or at least how it got to your code. The further you move validation down the call-chain, the more call sites you have to examine to find out how starts_at() came by this bad data. More call sites == more work == bad.

Now, I sympathize somewhat with Net::Google on this one. It’s hard to do input validation when the final arbiter of “valid input” is a web service that you don’t control. Every time the web service changes — either to accept previously-invalid data or vice-versa — you have to re-release. That sucks. However, there’s still some minimal validation you can do. For example, Net::Google::Search’s lr() accessor doesn’t have to know about every possible language restriction, but it can make sure all arguments are of the right type (here: strings). In other cases, you might not be able to say with 100% certainty that every element in an arrayref is valid or invalid, but you can at least figure out whether they’re all integers.

It’s these little things that go a long way.

As I mentioned, I’ve recently taken over ownership of a project at work that makes heavy use of Google’s SOAP APIs, using Net::Google to handle the SOAP stuff. A major part of my initial marching orders were to fix one example of Net::Google’s curious definition of “error handling”.

There’s a long-standing problem with Google’s search API that it throws “502 Bad Gateway” errors around 20% of the time. Net::Google’s attitude toward this (and all other errors) is to simply carp and carry on. From our standpoint, we want a clean way of being able to differentiate what the error was: was the user’s API key invalid? Have they reached their query limit? Did we get yet another 502? Overriding Perl’s warning mechanism does not qualify as clean or pretty or…really, any positive adjective, so I downloaded Net::Google’s latest release and started reading.

I have only recently stopped screaming in terror.

It soon became clear that my task was larger than “throw errors when Google’s SOAP server craps out”. Much larger. So large that I’m now putting together a 2.0.0 release of this package. In the spirit of learning from mistakes, let’s work through this poor module and look at ways some of this stuff could be done better:

Do not ignore errors

I don’t care what language you’re using, On error resume next is never a good idea. In Net::Google’s case, any errors at the SOAP layer are trapped and reissued as warnings, continuing on its merry way as if nothing had happened. This includes things like:

  • “502 Bad Gateway” - for when the SOAP service is down
  • “Invalid authorization key” - when the user had given us a bad API key
  • “Daily limit of 1000 queries exceeded” - try again tomorrow

I want to be able to trap these and respond differently to each scenario. If I get a 502 error, I might sleep for a few seconds, then try again. If I get an “invalid authorization key”, I should inform the user so they can give me a valid key. The point is that I want to be able to respond to odd — one might say “exceptional” — behavior. Having these messages emitted as warnings makes that a lot harder.

Do not leave useless code lying around

From Net::Google::Search:

use constant RESTRICT_ENCODING => qw [ arabic gb ... a dozen more ... cyrillic utf8 ];

use constant RESTRICT_LANGUAGES => qw [ ar zh-CN ... a dozen more ... sv tr ];

use constant RESTRICT_COUNTRIES => qw [ AD ... dozens and dozens more ... ZA ZM ZR ];

use constant RESTRICT_TOPICS => qw [ unclesam linux mac bsd ];

use constant WATCH => "__estimatedTotalResultsCount";

Guess how many of these were used in the following code? That’s right, zero. None of them. Some of these arrays had several dozen elements, just taking up space to no effect. Several other submodules had similar sections, uselessly defining constants that would never be used in the code.

Worse, they took up brain cycles and time while I tried to figure out a) what they were for, then b) if I could safely remove them without breaking anything.

Do not modify constants

*sigh*. You’d think it would be obvious; these things are called constants for a reason. You don’t change them. You can’t change them. Ever. Yet what do I find in Net::Google::Service

use constant SERVICE_CACHE => {};

followed a little later by

&SERVICE_CACHE->{$service} = "$dir/Net/Google/Services/".&SERVICES->{$service};

That’s right: modifying a constant. What makes this even better The Daily WTF fodder is the comment above this line, complaining about how using the & sigil is a work-around for a bug in perl 5.00502. In the spirit of the old “It hurts when I do this”-”Don’t do that” joke, here’s my advice about this whole mess: don’t do it. I have trouble imagining the thought process that concluded, “You know, a hash just won’t do”.

“Invalid input” does not mean “guess”

There’s an old saying ’round these parts, “in the face of ambiguity, refuse the temptation to guess“. When it comes to programming, you can swap out “ambiguity” for any number of things and generate valid maxims all day long. In this situation, I’ll take “an error”: in the face of an error, refuse the temptation to guess. Here’s one such offender, taken from Net::Google::Search:

if (int($max) < 1) {
    carp "'$max' must be a int greater than 0";
    $max = 1;
}

Bonus points: notice that the warnings produced look like “‘-5′ must be an int greater than 0″. a) the quote marks around -5 make it look like a string, and b) a much more helpful message would be “max_results must be an integer greater than 0″, i.e., don’t just tell me how I screwed up, tell me what I screwed up, too.

Documenting it doesn’t make it so

Unmodified:

=head2 $obj->starts_at($at)

Returns an int. Default is 0.

Returns undef if there was an error.

=cut

sub starts_at {
  my $self = shift;
  my $at   = shift;

  if (defined($at)) {
    $self->{'_starts_at'} = $at;
  }

  return $self->{'_starts_at'};
}

Can someone please explain to me how this method “[r]eturns undef if there was an error”? No? I can’t either. Nor can I figure out what kind of errors could possibly arise in this code.

If you want a piece of code to do X, it helps to actually make it do X. If your docs say one thing and the code does another, the code is generally going to prevail when push comes to shove. It’s important to make sure your code does what you want it to; that’s where tests come in. Maybe this accessor method at one time had some kind of input validation and would in fact return undef if that validation failed, and just maybe that code accidentally got wiped out at some point. A test suite would let you know that.

My, what a nice segue for the next point..

Good tests are your friends

As shipped, Net::Google 1.0.1 has…let me count them…8 tests (12 if you include duplicates). And those 8 aren’t exactly worth much. Here’s one from t/002-cache.t:

my $search = $google->search();
isa_ok($search, 'Net::Google::Search');

$search->query(QUERY);
$search->max_results(MAXRESULTS);
$search->filter(1);

my $results = $search->results();
is(ref($results), 'ARRAY', 'Got results for '.QUERY);

That’s the full extent of the testing for the interface to Google’s search API. Is $results tested to make sure it has elements? Why waste time! Are the elements of $results tested to make sure they’re filled out? Poppycock! Do we ever make sure that the accessors on $search work correctly? Hells no!

I know testing isn’t fun and isn’t glamorous. No-one ever got laid or got their picture in the paper for working on a test suite. That doesn’t mean you shouldn’t do it, though, ’cause you know what else doesn’t get you laid? Writing poorly-specified, untested software that’s incredibly difficult for other people to modify because they have no idea what it was supposed to do in the first place. Call it karma.

“Method X exists” is not documentation

One last documentation gem before we go:

=head2 $obj->filter($bool)

Returns true or false. Returns undef if there was an error.

=cut

Thank $diety the docs don’t tell me what this property controls or does; it’s not like I may need to use it. Dear reader: if you want me to use your software, please tell me what it does. If I have to read the code to figure out what an accessor method controls, you lose; I’ll use someone else’s package. I just don’t care that much.


Next time we’ll go over what Net::Google teaches us about accessor methods. There was so much material here, it took up its own post.

When I posted the first version of svk-init, my handy-dandy SVK initialisation script, I mentioned that I wanted to have the script figure out the short name for me, based on the repository URL. Promises made, promises kept:

#!/usr/bin/perl

use warnings;
use strict;

my($source, $short) = @ARGV;

unless(defined $short)
{
  if($source =~ /^[^:]+://(.+)$/)
  {
    my @path = split('/', $1);

    for(my $i = 1; $i < @path; $i++)
    {
        if($path[$i] eq 'branches' || $path[$i] eq 'tags' || $path[$i] eq 'trunk')
        {
            $short = $path[$i-1];
            last;
        }
    }

    # Fallback: use the last part of the path
    $short ||= $path[-1];
  }
}

unless(defined $short)
{
  die('You need to provide a short name for the repository');
}

system("svk mirror $source //$short/main");
system("svk sync //$short/main");
system("svk cp //$short/main //$short/local "
                       ."-m 'Creating //$short/local'");
system("svk co //$short/local ~/src/$short");

This allows me to give svk-init a repository URL of

http://svn.python.org/projects/python/trunk/

and have svk-init correctly come up with “python” as the short name.


Relatedly, SVK’s author, Chia-liang Kao mentioned in the comments that the first version of svk-init could be compressed to svk cp $source ~/src/$short, plus answering a few questions. I tried it out, and sure enough, it does indeed do the job. I’m going to stick with svk-init for two reasons: 1) svk cp uses a different mirror naming scheme, and 2) svk cp requires you to answer 3-4 questions, meaning I can’t start it up and forget it about it like I do svk-init.

I’ve recently taken over ownership of a new project at work, a suite of web-based tools for search engine optimization. In the course of patching Net:Google, the module we use for talking to Google’s SOAP API I found myself staring down a number of classes, some with AUTOLOAD-based accessor systems, some with accessors based on the tried-and-true Class::Accessor::CopyNPaste method. In neither case was I pleased. I needed a real accessor generator.

A CPAN search turned up Class::BuildMethods and Class::MethodMaker as promising options. My quick review follows:

  • Both share the same declarative style: “give me methods x, y and z, with these defaults, using these functions for validation”.

    From Class::MethodMaker:

    package Foo::Bar;
    
    use Class::MethodMaker
        [ scalar => [{default => 10}, 'x'],
          scalar => [{default => 6}, 'y'],
        ];

    From Class::BuildMethods:

    package Foo::Bar;
    
    use Class::BuildMethods
        x => {default => 10},
        y => {default => 6};
  • Where Class::BuildMethods can only generate accessors for scalars (requiring arrays and hashes to be passed in by reference), Class::MethodMaker can generate special accessors for array and hash properties:

    use Class::MethodMaker
        [ array => [qw(x y)],
          hash => 'z',
        ];

    There’s a downside to this, though. When dealing with array accessors (and possibly hash accessors, too; I haven’t tested them), Class::MethodMaker gets too clever for my tastes: in scalar context, an array getter will return an arrayref, while in array context you get an array. The docs don’t make any mention this (that I was able to find), and I was seriously confused as to why

    is_deeply($self->x, [], 'Correct default (x)')

    kept complaining that an arrayref was not equal to “Correct default (x)”.

  • Class::MethodMaker offers considerably more flexibility and power than does Class::BuildMethods. While BuildMethods only allows you to validate the input to an accessor (i.e., accept or die), MethodMaker permits more general input-processing capabilities.

    A feature I found more useful is MethodMaker’s ability to generate a proper new() method for you. With this in hand, I was able to define an entire class like so:

    package Result;
    
    use Class::MethodMaker
        [ scalar => [qw/title URL snippet cachedSize/],
          scalar => [qw/hostName directoryCategory/],
          new => 'new'
        ];
  • Both modules (as of this writing) come with the same bug: neither allow you to give an accessor a default value of 0 or “”. While tracking down the cause of the bug in Class::BuildMethods was trivial, and a bug report has already been filed, fixing Class::MethodMaker, on the other hand,…

  • While Ovid’s code in Class::BuildMethods is very clear and easy to understand, Class::MethodMaker’s author seems to have picked the most complicated implementation possible. This is serious hate-the-end-user stuff. In what appears to be an effort to avoid writing any piece of code more than once, Class::MethodMaker’s internals are built-up by running several files through a custom-written preprocessor that makes absolutely no sense to anyone but its author and comes with zero documentation. It took me about 45 seconds to fix BuildMethods; after at least 30 minutes spent delving about in the core of Class::MethodMaker, I’m still no closer to coming up with a patch.

Conclusion: Class::MethodMaker comes across as overly powerful and complicated, with way more features, nooks and crannies than I’ll ever need, and an implementation that makes the Baby Jesus — not to mention my old CSCI 101 professor — cry. Class::BuildMethods is simple, elegant and gets the job done, though the lack of a new() emitter is a tick mark against it. Verdict: I’m going with Class::BuildMethods for now, plus submitting a patch for a new() method generator.

Continuing my “Extending SVK for fun and profit” series, I present the mergeproject macro, which builds upon the mymerge command I talked about last time.

As I mentioned in the mymerge article, I name my local and mirrored repositories //$project_name/local/ and //$project_name//main/, respectively. In addition, I follow the convention of giving my checkout paths equally imaginative names, like /home/collin/src/$project_name/.

When last we left our heros, I had managed to cut the command to sync my local repository to the mirrored repository down from a monstrous

svk sm -I //$project_name/local //$project_name/main

to a more lazy-coder-friendly

svk mm $project_name

That’s good, but we can go further.

Since all of my project checkouts follow the same naming conventions, and since most of my svk mm commands are issued from within the project’s checkout directory, there’s no reason for me to type $project_name each time. Some File::Spec incantations should be more than enough to figure this out for me.

After some digging around through SVK’s internals, I present you…mergeproject:

package SVK::Command::Mergeproject;
use strict;
use SVK::Version;  our $VERSION = $SVK::VERSION;

use base qw( SVK::Command::Mymerge );

use SVK::Util qw(splitdir catdir);
use SVK::I18N qw(loc);
use Cwd;

sub parse_arg {
    my $self = shift;
    my @arg = @_;
    return if @arg != 0;

    my $pwd = Cwd::cwd();
    my @dirs = splitdir($pwd);
    for(my $i = 0; $i < @dirs; $i++) {
        my $dir = catdir(@dirs[0..$i]);

        # See if the directory is a valid checkout path
        # If it's not, an error will be raised and $@ will be set.
        # If the directory is a valid checkout path, pass only the
        #  directory name -- ie, not the full path -- up to
        #  Mymerge, which will handle the rest.
        eval { $self->{xd}->find_repos_from_co($dir, 0) };
        unless ($@) {
            return $self->SUPER::parse_arg($dirs[$i]);
        }
    }

    die loc(”Unable to find a checkout path while traversing %1n”,
                $pwd);
}

We use Cwd::cwd() to grab the absolute path to the current directory, then use SVK::Util::splitdir() (SVK::Util autoloads all the useful bits of File::Spec for us) to break the path into individual directory names. We then iterate over the list of directory names, building up longer and longer paths with SVK::Util::catpath(). For example, given the current working directory of /home/collin/src/svnmock/trunk/, we’d look in the following succession of directories for SVK checkouts:

/
/home/
/home/collin/
/home/collin/src/
/home/collin/src/svnmock/
/home/collin/src/svnmock/trunk/

stopping once SVK::XD::find_repos_from_co() reports that we have indeed found one. (In the above example, we’d end up stopping at /home/collin/src/svnmock/, the first directory that SVK can map to a repository.) The second argument of 0 to find_repos_from_co() tells SVK that we’re only interested in whether the checkout maps to a repository.

Once we’ve found a valid checkout path, the last directory in the series (the one that actually holds the checkout) is assumed to be the project name and so is passed up to mymerge.

Let’s recap: we went from this:

svk sm -I //$project_name/local //$project_name/main

to this

svk mm $project_name

to now this (using the mp alias for mergeproject)

svk mp

Hooray, laziness!

If you’re interested in doing something similar, put this code in /usr/lib/perl5/site_perl/*/SVK/Commands/Mergeproject.pm or wherever your SVK command modules happen to be. If you want to use a shortcut (I use mp), you’ll need to add a line to the %alias hash in SVK::Command, something like “mp mergeproject”.

All my projects use Subversion for revision control, but on my laptop, I use SVK so I can keep working and committing even when away from an Internet connection. (Also: SVK’s merge support and branch tracking beats SVN’s hands down).

(This isn’t an SVK tutorial. For that, you should check out Ron Bieber’s excellent series of SVK tutorials.)

One part of SVK’s everyday workflow is having a mirrored repository, which represents the remote SVN repository, and a local repository, where you normally commit to. You then merge between these repositories — from local to mirrored to push changes to the main repository, mirrored to local to sync with the main repository.

For each project I work on, I name the mirrored and local repositories //$project_name/main and //$project_name/local, respectively. This means every time I want to push changes up to the main SVN repositories, I type svk sm -I //$project_name/local //$project_name/main — automatically merge all changes between branches, incremental commits.

Because that command never changes, and because I’m lazy, I wrote a mymerge “macro” to save myself the trouble of all that typing. Now, instead of that big, long command, I type svk mm $project_name. Much better.

mymerge works by subclassing SVK’s SVK::Command::Smerge class and overriding the parse_args() method. It does some monkeying around with the arguments, then hands control off to smerge.

package SVK::Command::Mymerge;
use strict;
use SVK::Version;  our $VERSION = $SVK::VERSION;

use base qw( SVK::Command::Smerge );

sub options { () }

sub parse_arg {
    my $self = shift;
    my @arg = @_;
    return if $#arg < 0;

    my $depot = $arg[0];

    $self->{incremental} = 1;
    return $self->SUPER::parse_arg(”//$depot/local”,
                                   “//$depot/main”);
}

The $self->{incremental} = 1; assignment is the same as supplying the -I flag to smerge in the original example. The last line does the important argument-mucking before passing control up to smerge.

If you’re interested in doing something similar, put this code in /usr/lib/perl5/site_perl/*/SVK/Commands/Mymerge.pm or wherever your SVK command modules happen to be. If you want to use a shortcut (I use mm), you’ll need to add a line to the %alias hash in SVK::Command, something like “mm mymerge”.

After quite a long recharging-the-batteries sabbatical from work on community2.com, coding has recommenced. At the moment, I’m going back through my unfinished branches in the SVN repository, seeing what still needs to be merged and whatnot.

Today’s major change: I’ve eliminated eCore’s dependence on the “gods” usergroup. The name “gods” was an unfortunate relic of the engine’s original design (way back in 1998-1999), which had a lot of role playing/Dungeons and Dragons-style overtones to its terminology (to wit: the names of the experience levels on Everything2, another eCore site). All the hardcoded references to “gods” have been scrubbed in favour of an admin_group key in the System Settings settings node. Note that if you don’t set this key, the system will default to using “gods” as the name of the group; this was done to preserve backwards compatibility.

The change looks to be holding; at least, I was able to nuke the “gods” group and nothing broke. For an eCore site, “nothing broke” is generally a pretty good sign.

One of the problems with writing code that interacts with a database (for example) is that it’s hard to test; do you have your test suite create, populate and destroy whole databases to test your code against? That’s an awful lot of work. Then there’s the question about testing those tricky edge cases, like certain error conditions: if you’re creating new databases, how do you simulate, say, your database server suffering a catastrophic failure. I can think of one solution, but it would get pretty expensive pretty quickly.

Fortunately, since this is a common enough problem (testing database interaction), smart people have already solved it. In the perl community, for example, DBD::Mock (full disclosure: I work on DBD::Mock some) is one fairly well-known way of testing interaction with database systems. It allows you to say, “Make sure the following SQL statements are executed in the following order, and return these results”. It can also be used to simulate what happens when the connection to the database dies mid-query, for example.

Once you move away from databases, though, to other, though still complex systems, the range of testing options becomes more limited. In my case, I wanted to write several applications and support libraries on top of the Python bindings for Subversion, a popular revision control system. When it came time to write tests for the Subversion-facing code, I faced a dilemma: was I back to creating full-blown SVN repositories for each test case?

My solution was svnmock. The svnmock package does for Subversion’s Python API what DBD::Mock does for perl’s DBI: it makes testing easy. svnmock allows the test writer to say, like with DBD::Mock, “I want the following series of function calls, with this set of parameters and this return value”. While this may seem fairly low-level, it is trivial to write macro-like constructs on top of the current set of primitives. One of my favourite features of svnmock is that it allows you to specify that the return value from api_func_1() must be used as a parameter to api_func_2().

svnmock’s project website may be found here.