[SlugBug] text munging question

Jonathan jonathan at sirtis.org.uk
Wed Jun 16 15:42:12 BST 2004


pault wrote:

> hi all
> 
> i have a number of irc log files which i'd like to tidy up.
> 
> one of the problems is that these cover a period when my connection was flaky
> [take a bow blueyonder]. so i ended up with multiple connections to the same
> channel. as i had logging enabled i have lots of examples of the same line
> being repeated two or three times. i would like to remove all the duplicate
> lines.
> 
> i am open to suggestions as how this may best be accomplished [perl? regex?
> dunno?]

Sorry, didn't finish that last email before I accidentally sent it!

As I was saying, cat | uniq would remove any adjacent duplicate lines.

If you are dealing with logs that have timestamps in them, you might 
want to take a look at some of the apache utils which join logs files 
together, that would allow you to keep things in time order and once 
sorted that way, you could then run it through uniq.

There's one for apache available at: http://mergelog.sourceforge.net/

Or a shell script that does it is:

#!/bin/sh
if [ ! -f $1 ]; then
  echo "Usage: $0 "
  exit
fi
echo "Sorting $1"
sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 
4.17,4.18n -k 4.20,4.21n $1 > $1.sorted


Maybe you could figure out the format of your IRC logs and adapt that?

Regards,

Jonathan



More information about the SlugBug mailing list