[SlugBug] perl help required - helix server log parsing
Bill Best
bill at commedia.org.uk
Fri Jun 18 12:10:51 BST 2004
hi
i'm running helix universal server for webcasting and have a problem
getting stats from the server logfiles as they are not output in Common
Log Format.
i've obtained a perl script from RealNetworks which is supposed to parse
the rmaccess.log file and create a standard format log file that can be
interpreted by webalizer.
however, when i invoke webalizer from the CLI thus:
webalizer -c /etc/webalizer.conf
it bombs out with multiple instances of:
> Skipping bad record (xxxxx)
> Skipping bad record (xxxxx)
> No valid records found!
where xxxxx is a record number. therefore the script isn't working as
it ought.
the input log file from helix server is of the form:
192.168.1.45 - - [18/Jun/2004:03:25:45 +0000] "GET
ramgen/encoder/svt.rm HTTP/1.0" 200 0 [Mozilla/4.0 (compatible;MSIE
6.0;Windows NT 5.1)] [] [UNKNOWN] 137 0 0 0 0 1
192.168.1.45 - - [18/Jun/2004:03:25:51 +0000] "GET
ramgen/encoder/svt.rm HTTP/1.0" 200 0
[WinNT_5.1_6.0.11.868_RealPlayer_R12ESD_es_UNK]
[00000000-0000-0000-0000-000000000000] [UNKNOWN] 137 0 0 0 0 2
and Common Log Format typically looks something like this:
192.168.1.252 - - [13/Jun/2004:04:11:43 +0100] "GET /robots.txt
HTTP/1.0" 200 10206 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"
192.168.1.252 - - [13/Jun/2004:04:11:44 +0100] "GET
/article/articleview/363/1/16/ HTTP/1.0" 200 28058 "-" "msnbot/0.11
(+http://search.msn.com/msnbot.htm)"
but what i'm now getting after using the script (below) looks like this:
192.168.1.45 - - [18/Jun/2004:03:25:45 +0000] "GET
ramgen/encoder/svt.rm HTTP/1.0" 200 0 [Mozilla/4.0 (compatible;MSIE
6.0;Windows NT 5.1)] [] [UNKNOWN] 137 0 0 0 0 1
192.168.1.45 - - [18/Jun/2004:03:25:51 +0000] "GET
ramgen/encoder/svt.rm HTTP/1.0" 200 0
[WinNT_5.1_6.0.11.868_RealPlayer_R12ESD_es_UNK]
[00000000-0000-0000-0000-000000000000] [UNKNOWN] 137 0 0 0 0 2
which actually doesn't look a great deal different but when i run diff
on the input/output file it does report a small number of differences.
so, in time-honoured fashion, does anybody have any ideas?
cheers
bb
> #!/usr/bin/perl -w
>
> use strict;
> use Fcntl;
>
> my $path = "/home/helix/Logs/rmaccess.log";
>
> sysopen(HELIXLOG, $path, O_RDONLY)
> or die "Couldn't open $path for reading: $!\n";
>
> while (<HELIXLOG>) {
> chomp;
> if (m/^(\d+\.\d+\.\d+\.\d+) \-.*?\- \[(.*?)\] \"(.*?\.rm.*?)\" (\d+) (\d+) \[(.*?)\] .*$/) {
>
> my $ip_number = $1;
> my $date = $2;
> my $url = $3;
> my $response = $4;
> my $numbytes = $5;
> my $client_info = $6;
>
> print "$ip_number - - \[$date\] \"$url\" $response $numbytes \"-\" \"$client_info\"\n";
> }
> }
>
> close HELIXLOG;
More information about the SlugBug
mailing list