开发者

Remove lines with duplicate cells

I need to remove lines with a duplicate value. For example I need to remove line 1 and 3 in the block below because they contain "Value04" - I cannot remove all lines containing Value03 because there are lines with that data that are NOT duplicates and must be kept. I can use any editor; excel, vim, any other Linux command lines.

In the end there should be no duplicate "UserX" values. User1 should only appear 1 time. But if User1 exists开发者_Python百科 twice, I need to remove the entire line containing "Value04" and keep the one with "Value03"

Value01,Value03,User1
Value02,Value04,User1
Value01,Value03,User2
Value02,Value04,User2
Value01,Value03,User3
Value01,Value03,User4

Your ideas and thoughts are greatly appreciated.

Edit: For clarity and leaving words out from the editing process.


The following Awk command removes all but the first occurrence of a value in the third column:

$ awk -F',' '{
  if (!seen[$3]) {
    seen[$3] = 1
    print
   }
}' textfile.txt

Output:

Value01,Value03,User1
Value01,Value03,User2
Value01,Value03,User3
Value01,Value03,User4


same thing in Perl:

perl -F, -nae 'print unless $c{$F[2]}++;' textfile.txt 

this uses autosplit mode: "-F, -a" splits by comma and places the result into @F array

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜