[SlugBug] perl help required - helix server log parsing

Bill Best bill at commedia.org.uk
Fri Jun 18 12:10:51 BST 2004


hi

i'm running helix universal server for webcasting and have a problem 
getting stats from the server logfiles as they are not output in Common 
Log Format.

i've obtained a perl script from RealNetworks which is supposed to parse 
the rmaccess.log file and create a standard format log file that can be 
interpreted by webalizer.

however, when i invoke webalizer from the CLI thus:

webalizer -c /etc/webalizer.conf

it bombs out with multiple instances of:

> Skipping bad record (xxxxx)
> Skipping bad record (xxxxx)
> No valid records found!

where xxxxx is a record number.  therefore the script isn't working as 
it ought.

the input log file from helix server is of the form:

192.168.1.45 - - [18/Jun/2004:03:25:45 +0000]  "GET 
ramgen/encoder/svt.rm HTTP/1.0" 200 0 [Mozilla/4.0 (compatible;MSIE 
6.0;Windows NT 5.1)] [] [UNKNOWN] 137 0 0 0 0 1
192.168.1.45 - - [18/Jun/2004:03:25:51 +0000]  "GET 
ramgen/encoder/svt.rm HTTP/1.0" 200 0 
[WinNT_5.1_6.0.11.868_RealPlayer_R12ESD_es_UNK] 
[00000000-0000-0000-0000-000000000000] [UNKNOWN] 137 0 0 0 0 2

and Common Log Format typically looks something like this:

192.168.1.252 - - [13/Jun/2004:04:11:43 +0100] "GET /robots.txt 
HTTP/1.0" 200 10206 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"
192.168.1.252 - - [13/Jun/2004:04:11:44 +0100] "GET 
/article/articleview/363/1/16/ HTTP/1.0" 200 28058 "-" "msnbot/0.11 
(+http://search.msn.com/msnbot.htm)"

but what i'm now getting after using the script (below) looks like this:

192.168.1.45 - - [18/Jun/2004:03:25:45 +0000]  "GET 
ramgen/encoder/svt.rm HTTP/1.0" 200 0 [Mozilla/4.0 (compatible;MSIE 
6.0;Windows NT 5.1)] [] [UNKNOWN] 137 0 0 0 0 1
192.168.1.45 - - [18/Jun/2004:03:25:51 +0000]  "GET 
ramgen/encoder/svt.rm HTTP/1.0" 200 0 
[WinNT_5.1_6.0.11.868_RealPlayer_R12ESD_es_UNK] 
[00000000-0000-0000-0000-000000000000] [UNKNOWN] 137 0 0 0 0 2

which actually doesn't look a great deal different but when i run diff 
on the input/output file it does report a small number of differences.

so, in time-honoured fashion, does anybody have any ideas?

cheers

bb

> #!/usr/bin/perl -w
> 
> use strict;
> use Fcntl;
> 
> my $path = "/home/helix/Logs/rmaccess.log";
> 
> sysopen(HELIXLOG, $path, O_RDONLY)
> or die "Couldn't open $path for reading: $!\n";
> 
> while (<HELIXLOG>) {
> chomp;
> if (m/^(\d+\.\d+\.\d+\.\d+) \-.*?\- \[(.*?)\] \"(.*?\.rm.*?)\" (\d+) (\d+) \[(.*?)\] .*$/) {
> 
> my $ip_number = $1;
> my $date = $2;
> my $url = $3;
> my $response = $4;
> my $numbytes = $5;
> my $client_info = $6;
> 
> print "$ip_number - - \[$date\] \"$url\" $response $numbytes \"-\" \"$client_info\"\n";
> }
> }
> 
> close HELIXLOG;


More information about the SlugBug mailing list