Find lines beginning with same string and keep last occurance

2023-03-22 15:19 问答作者：

I have this data:

E 71484666NC 1201011060240260 387802-1227810  1022    25   0   5   2   313D 0 1G5
E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn

I need to find lines starting with same first 12 characters. If there are multiples, I need to delete previous occurrences and only keep the last one. So it should be like this:

E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002P开发者_StackOverflow社区R 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn

Note: In most cases characters after the first 12 do not match... So checking duplicate lines is not an option.

Note: Need to preserve the order.

from collections import OrderedDict

lines = OrderedDict()
for line in file:
    lines[line[0:12]] = line

This will preserve the order of the lines while eliminating duplicates.

Edit: This version of OrderedDict works on Python 2.4, 2.5, and 2.6.

from collections import OrderedDict

mydata = """E 71484666NC 1201011060240260 387802-1227810  1022    25   0   5   2   313D 0 1G5
E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn"""

datalines = mydata.split('\n')
uniques = OrderedDict((x[:12],x[12:]) for x in datalines)
final = [x+y for x,y in uniques.items()]

for x in final:
  print x

This produces:

E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn

Use a dictionary, taking the first 12 characters as a key:

mydict = {}
for line in file:
    key = line[:12]
    value = line
    mydict[key] = line

this automatically overrides all previous entries.

继续阅读：duplicates ordereddictionary python search text

Find lines beginning with same string and keep last occurance

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？