AWK/BASH: How to remove duplicate rows from file with known field range?
I was wondering if there was a way to use bash开发者_运维百科/awk to remove duplicate rows based on a known field range. For example:
Easy Going USA:22 May 1926
Easy Going Gordon USA:6 August 1925
Easy Life USA:20 May 1944
Easy Listening USA:14 January 2002
Easy Listening USA:10 October 2002
Easy Listening USA:27 January 2004
Easy Living USA:7 July 1937
Easy Living USA:16 July 1937
Easy Living USA:4 September 2009
I would like to remove duplicate move titles. The movie title will always be from $1 through $(NF-3). Ideally I would like to stick with the first occurrence (earliest date), but if that's not possible then it doesn't matter.
Thanks,
Tomek
#!/bin/bash
awk 'BEGIN{
m=split("January|February|March|April|May|June|July|August|September|October|November|December",d,"|")
for(o=1;o<=m;o++){
months[d[o]]=sprintf("%02d",o)
}
}
{
sub(/.*:/,"",$(NF-2))
t=mktime($(NF)" "months[$(NF-1)]" "$(NF-2)" 0 0 0")
time[t]=$(NF-2) FS $(NF-1) FS $(NF)
$(NF-2)=$(NF-1)=$(NF)=""
gsub(/ +$/,"")
if (!($0 in array)){array[$0]=99999999999999}
if ( t <= array[$0] ){ array[$0]=t }
}
END{
for(i in array){ print "->",i,time[array[i]] }
} ' file
output
$ ./shell.sh
-> Easy Living 7 July 1937
-> Easy Going Gordon 6 August 1925
-> Easy Listening 14 January 2002
-> Easy Going 22 May 1926
-> Easy Life 20 May 1944
awk '
{
line = $0
$(NF-2) = $(NF-1) = $NF = ""
if ( ! ($0 in movies))
movies[$0] = line
}
END {
for (m in movies) print movies[m]
}
' movies.txt
That does not preserve the original line ordering. You might want to sort
the output.
This could be a quick answer
sort -t':' -k1,1 -u your-file
精彩评论