[SlugBug] Re: data formatting query

Thu Dec 8 20:41:27 GMT 2005

And Lo! The Great Prophet Bill Best uttered these words of wisdom:
>

Oh goody - a long time since I've seen a scripting question :-)

>
[[ converting this...]]
>
> Date: 081205 Code: 3  Origin: R0E306
>
> Date: 081205 Code: 5  Origin: R0E306
>
> Date: 061205 Code: 3  Origin: R0E306

[[ to this...]]

> Date: 051205
> Date: 061205
> Date: 081205

Several ways to do this, but this seems to be the quickest with my
knowledge:

	awk '{ print $1, $2 }' < inputfile | grep -v '^$' | sort | uniq 

:-)

This may even work (untested):
	awk '{ print $1, $2 }' < inputfile | sort -u | tail +1

Blanklines will get uniq'd to a single line and get placed right at the top 
of the file. And 'tail +2' basically says tail the input starting from line 
2 :-)

You cut use 'cut' rather than awk, or 'sort -u' for the sort, or write the 
entire thing as an awk script, although this script assumes the dates in 
the file are already sorted, so you just want to remove duplicate dates:

	## This awk script is completely untested!
	BEGIN {
		last=""
		OFS=" "
	}

	/^Date:/ {
		if ($2 != last) {
			print $1, $2
			last = $2
		}
	}
	## End of script.

By the way, I'm assuming the dates in the file are all in the same month? 
If not, then a simple sort won't work - the format ddmmyy isn't ASCII 
sortable: 010705 (1 July '05) will be placed above 310505 (31 May 05). The 
date formay 'yymmdd' or even beter 'yyyymmdd' allows simple ASCII sorts of 
dates without having to resort to convoluted or custom sort rules/routines.

Cheers,

Chris...

-- 
\ Chris Johnson                 \ NP: KAJIURA Yuki - HIMITSU
 \ cej at nightwolf.org.uk          \  
  \ http://cej.nightwolf.org.uk/  \ 
   \ http://redclaw.org.uk/        ~---------------------------------------