How to only print lines with unique fields?

2023-04-10 02:00 问答作者：

For example... if I had a file like this:

A   16  chr11   36595888
A   0   chr1    155517200
B   16  chr1    43227072
C   0   chr20   55648508
D   0   chr2    52375454
D   16  chr2    73574214
D   0   chr3    93549403
E   16  chr3    3315671

I need to print only the lines which have a unique first column:

B   16开发者_高级运维  chr1    43227072
C   0   chr20   55648508
E   16  chr3    3315671

It's similar to awk '!_[$1]++', but I want to remove all lines which have non-unique fist field.

Bash and python solutions preferably.

in bash, assuming first column has fixed with (3):

sort input-file.txt | uniq -u -w 3

'-u' option prints only the unique lines and '-w 3' compares no more than the first 3 characters.

How about this:

#!/usr/bin/env python
from collections import defaultdict
data = defaultdict(list)
with open('file', 'rb') as f:
    for line in sorted(f.readlines()):
        data[line[0]].append(line)
for key in sorted(data.iterkeys()):
    if len(data[key]) == 1:
        print data[key]

awk '
  {count[$1]++; line[$1]=$0}
  END {for (val in count) if (count[val]==1) print line[val]}
' filename

That may alter the order of lines. If that's a problem, try this 2-pass approach:

awk '
  NR==FNR {count[$1]++; next}
  count[$1] == 1 {print}
' filename filename

sed one liner solution:

sed ':a;$bb;N;/^\(.\).*\n\1[^\n]*$/ba;:b;s/^\(.\).*\n\1[^\n]*\n*//;ta;/./P;D' file

in python, much easier to read and tweak:

d = dict()
for line in open('input-file.txt', 'r'):
  key = line.split(' ', 1)[0]
  d.setdefault(key, list()).append(line.rstrip())

for k, v in sorted(d.items()):
  if len(v) == 1:
     print v[0]

import sys
from collections import OrderedDict
lines = OrderedDict()
for line in sys.stdin:
    field0 = line.strip().split('\t')[0]
    lines[field0] = None if field0 in lines else line
for line in lines.values():
    if line is not None:
        sys.stdout.write(line)

If you don't care aout preserving order you could use a plain old dict ({}) instead of OrderedDict.

This implementation doesn't care if the duplicate fields are adjacent.

继续阅读：bash python unique

How to only print lines with unique fields?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？