Playing with PERL | The Spark Between

Warning: The following content is of a ridiculously nerdy nature, and probably unsuited for most of the viewing audience. That said, I had a lot of fun writing it, so here it is 🙂

My roommate Dave IMed me the other day with a problem: He needed a program in 30 minutes that would search through a text file for any occurrences of a list of CAPITALIZED words, and convert them to lowercase, wherever they occurred.

I received his message 20 minutes later, set to work, and in the last 9 minutes, developed a script for him, along with an extensive test case. PERL, of course, was born to solve this problem.

Here’s the the test case I made up, and the correct output, to get an idea of what I needed to do:

input.txt

There once was a Chicken named EGgS.
It lived STRINGS in a barn.
The CHICKEN was afraid of EGGSNITCHERS.
Chicken likes Eggs served on STRINGS

output.txt

There once was a chicken named eggs.
It lived strings in a barn.
The chicken was afraid of EGGSNITCHERS.
chicken likes eggs served on strings

(I later realized this missed one rather important case… BUG: see if you can figure out what it is, and what the error in the first three drafts below is)

Here’s the first draft, which works (nearly…see bug note above) correctly and was submitted within the prescribed 30 minutes :-):

munge1.pl

#!/usr/bin/env perl
use strict;
my @patterns = (
"chicken",
"eggS",
"strings"
);

# Make sure user used lowercase
map { tr/[A-Z]/[a-z]/; } @patterns;

my $input_file = $ARGV[0] or die "Usage: go.pl n";
open (FH, $input_file) or die "Could not read from file: $input_filen";

while (my $line = ) {
foreach (@patterns) {
$line =~ s/^$_(W)/$_$1/i;
$line =~ s/(W)$_$/$1$_/i;
$line =~ s/(W)$_(W)/$1$_$2/i;
}
print $line;
}

close FH;

exit 0;

But then I thought to myself… “Self, this is PERL. Surely there is a shorter way?”
Removing some “useless” error-checking and file parsing code in favor of a shell-out, I came up with this:

munge2.pl

#!/usr/bin/env perl
my @patterns = (
"chicken",
"eggS",
"strings"
);

map { tr/[A-Z]/[a-z]/; } @patterns;

map {
foreach $a (@patterns) {
s/^$a(W)/$a$1/i;
s/(W)$a$/$1$a/i;
s/(W)$a(W)/$1$a$2/i;
}
print;
} `cat $ARGV[0]`;

Better, shorter, PERL-ier 🙂

But still not really PERL. I mean, come on. There were 3 entire statements there. Laaame.

So I played a bit, and moved the first map around to compact two statements into one (admittedly, the map is just there to make sure the user’s list of TERMS to lowercase is *actually* lowercase, but I wanted to keep that bit of functionality):

munge3.pl

map { tr/[A-Z]/[a-z]/; } (@patterns =  ("chicken","eggS","strings"));
map { foreach $a (@patterns) { s/^$a(W)/$a$1/i; s/(W)$a$/$1$a/i; s/(W)$a(W)/$1$a$2/i; } print; } `cat $ARGV[0]`;

Nice. But still not PERL-y. 😉
Then it hit me: why am I wasting an entire statement to create an array that I will only use in one other statement? Oh, and while we’re at it, let’s cut those 3 regexps down to 1, courtesy of a good insight from Jason. (Use b and B to match word boundaries.) AND, in a throw-back to my early days of escaping URL strings in CGI (like: s/(W)/sprintf("%%%02x",ord($1))/eg) let’s move the lowercase-ifying inside the regexp as well, eliminating the first map altogether.

munge.pl

map { foreach $b("chicken","eggS","strings"){s/b$bB/lc $b/ieg;} print;} `cat $ARGV[0]`;

Ahhh… that is PERL :-). One line of file-munging goodness. Use only as directed:

/usr/bin/env perl munge.pl input.txt > output.txt

Note: I know this is not good coding practice, it was just fun to reduce to as short of a program as possible. And there’s something to be said for brevity, as well. (… is the soul of wit…)

If you’re concerned about the error-checking of said code, tack this on the end 😀

or die “Caught teh 3rrorz!!1”; # 😉