Shell script-duplicate records
I am facing problem in removing the duplicate entries.(I am not good in Shell!).here is the situation- Application creates a flat text file. Each line is one record and each field is seperated by delimiter "~|"(quotes excluded). So record looks like-
Field1~|Field2~|Field3~|Field4~|Field5~|Field6~|Field7~|
There are some records which are duplicate.Duplicate record is decided by value of field- Field2. How to write shell script/awk/sed to remove duplicate records based on this criteria? Script then has to write output to some other file. I could have done this in application itself but due to performance problem it can not be done. Thanks for help.
Input file
Field1~|ABA~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|PQR~|Field3~|Field4~|Field5~|Field6~|Field7~|开发者_StackOverflow
Field1~|XYZ~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|ABA~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|RST~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|PQR~|Field3~|Field4~|Field5~|Field6~|Field7~|
Output should be-
Field1~|ABA~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|PQR~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|XYZ~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|RST~|Field3~|Field4~|Field5~|Field6~|Field7~|
(order of the records doesn't matter.)
Not sure if I understood the question correctly, but is this what you're looking for?:
test.txt:
Field1~|Field2~|Field3~|Field4~|Field5~|Field6~|Field7~|
foo~|Field2~|bar~|Field4~|Field5~|Field6~|Field7~|
Field1~|foobar~|Field3~|Field4~|Field5~|Field6~|Field7~|
Calling sort:
sort --field-separator="~" --key 2,2 --unique test.txt
Results in:
Field1~|Field2~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|foobar~|Field3~|Field4~|Field5~|Field6~|Field7~|
If you want to remove all duplicates
nawk -F'~|' '{a[$2]++;b[$2]=$0}END{for(i in a) if (a[i]==1){print b[i]} }' file
If you want to keep only one version of duplicate record
nawk -F'~|' '!a[$2]++' file
精彩评论