Monday, March 12, 2007

Powershell versus Perl

I've just been preparing some data for a post which I've been meaning to put up for a few months now. The data comes from an IIS log file and I need to pull out of it the time of the HTTP request, the HTTP service (uri-stem), and the duration of the request (time-taken). In the past I've always used Perl for this sort of task taking advantage of the regular expression syntax to extract my chosen data elements. For a 50MB sample log file (all logging options turned on) this takes approximately 12 seconds.

The Perl I've just used to test this follows:


use File::DosGlob 'glob';
use File::DosGlob 'GLOBAL_glob';

@logfiles = glob "ex*.log";

for my $logfile (@logfiles) {

open(INFILE, "$logfile");
$logfile =~ s/ex//g;

open(OUTFILE, ">$logfile");

while() {
if (m"^(\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d).*/(.*\.asmx).*\s(\d+)$") {
$file = lc($2);
print OUTFILE "$1\t$file\t$3\n";
}
}
close(INFILE);
close(OUTFILE);
}


Now, since I've started using Powershell recently to extract and manipulate data for analysis I thought I'd also try the same thing with that. Note I'm just a beginner at this so I could be doing this the wrong way but here's what I tried:


 
Get-Content ex070223.log | 
 
foreach-object { if ($_ -match "(?^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d).*/(?.*\.asmx).*\s(?\d+$)")
 { & { $matches["occured"] + "," + $matches["service"] + ", " + $matches["duration"] } } }

The regular expression syntax is very powerful, I like the named matches - I guess this is a straight .net runtime feature, but I'm easily impressed. However, the time it takes to complete is abominable! It took 20 minutes to complete where the Perl program took 12 seconds.

Still, the power available from the command line is impressive...

No comments: