Shell script-duplicate records

2023-03-01 17:25 问答作者：

I am facing problem in removing the duplicate entries.(I am not good in Shell!).here is the situation- Application creates a flat text file. Each line is one record and each field is seperated by delimiter "~|"(quotes excluded). So record looks like-

Field1~|Field2~|Field3~|Field4~|Field5~|Field6~|Field7~|

There are some records which are duplicate.Duplicate record is decided by value of field- Field2. How to write shell script/awk/sed to remove duplicate records based on this criteria? Script then has to write output to some other file. I could have done this in application itself but due to performance problem it can not be done. Thanks for help.

Input file

Field1~|ABA~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|PQR~|Field3~|Field4~|Field5~|Field6~|Field7~|开发者_StackOverflow
Field1~|XYZ~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|ABA~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|RST~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|PQR~|Field3~|Field4~|Field5~|Field6~|Field7~|

Output should be-

Field1~|ABA~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|PQR~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|XYZ~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|RST~|Field3~|Field4~|Field5~|Field6~|Field7~|

(order of the records doesn't matter.)

Not sure if I understood the question correctly, but is this what you're looking for?:

test.txt:

Field1~|Field2~|Field3~|Field4~|Field5~|Field6~|Field7~|
foo~|Field2~|bar~|Field4~|Field5~|Field6~|Field7~|
Field1~|foobar~|Field3~|Field4~|Field5~|Field6~|Field7~|

Calling sort:

sort --field-separator="~" --key 2,2 --unique test.txt

Results in:

Field1~|Field2~|Field3~|Field4~|Field5~|Field6~|Field7~|
Field1~|foobar~|Field3~|Field4~|Field5~|Field6~|Field7~|

If you want to remove all duplicates

nawk -F'~|' '{a[$2]++;b[$2]=$0}END{for(i in a) if (a[i]==1){print b[i]} }' file

If you want to keep only one version of duplicate record

nawk -F'~|' '!a[$2]++' file

继续阅读：shell

Shell script-duplicate records

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？