[SlugBug] text munging question
Jonathan
jonathan at sirtis.org.uk
Wed Jun 16 15:42:12 BST 2004
pault wrote:
> hi all
>
> i have a number of irc log files which i'd like to tidy up.
>
> one of the problems is that these cover a period when my connection was flaky
> [take a bow blueyonder]. so i ended up with multiple connections to the same
> channel. as i had logging enabled i have lots of examples of the same line
> being repeated two or three times. i would like to remove all the duplicate
> lines.
>
> i am open to suggestions as how this may best be accomplished [perl? regex?
> dunno?]
Sorry, didn't finish that last email before I accidentally sent it!
As I was saying, cat | uniq would remove any adjacent duplicate lines.
If you are dealing with logs that have timestamps in them, you might
want to take a look at some of the apache utils which join logs files
together, that would allow you to keep things in time order and once
sorted that way, you could then run it through uniq.
There's one for apache available at: http://mergelog.sourceforge.net/
Or a shell script that does it is:
#!/bin/sh
if [ ! -f $1 ]; then
echo "Usage: $0 "
exit
fi
echo "Sorting $1"
sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k
4.17,4.18n -k 4.20,4.21n $1 > $1.sorted
Maybe you could figure out the format of your IRC logs and adapt that?
Regards,
Jonathan
More information about the SlugBug
mailing list