Find lines beginning with same string and keep last occurance
I have this data:
E 71484666NC 1201011060240260 387802-1227810 1022 25 0 5 2 313D 0 1G5
E 71484666NC 1201011060240263 387902-1227910 1300 10 0 2 1 300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007 021 10 0 896 71 4 131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726 5 5 935 50 46 21282D 5 0hn
I need to find lines starting with same first 12 characters. If there are multiples, I need to delete previous occurrences and only keep the last one. So it should be like this:
E 71484666NC 1201011060240263 387902-1227910 1300 10 0 2 1 300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007 021 10 0 896 71 4 131L 2 AA2
E 10310002P开发者_StackOverflow社区R 0201011060102315 191509 -664820 39726 5 5 935 50 46 21282D 5 0hn
Note: In most cases characters after the first 12 do not match... So checking duplicate lines is not an option.
Note: Need to preserve the order.
from collections import OrderedDict
lines = OrderedDict()
for line in file:
lines[line[0:12]] = line
This will preserve the order of the lines while eliminating duplicates.
Edit: This version of OrderedDict works on Python 2.4, 2.5, and 2.6.
from collections import OrderedDict
mydata = """E 71484666NC 1201011060240260 387802-1227810 1022 25 0 5 2 313D 0 1G5
E 71484666NC 1201011060240263 387902-1227910 1300 10 0 2 1 300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007 021 10 0 896 71 4 131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726 5 5 935 50 46 21282D 5 0hn"""
datalines = mydata.split('\n')
uniques = OrderedDict((x[:12],x[12:]) for x in datalines)
final = [x+y for x,y in uniques.items()]
for x in final:
print x
This produces:
E 71484666NC 1201011060240263 387902-1227910 1300 10 0 2 1 300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007 021 10 0 896 71 4 131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726 5 5 935 50 46 21282D 5 0hn
Use a dictionary, taking the first 12 characters as a key:
mydict = {}
for line in file:
key = line[:12]
value = line
mydict[key] = line
this automatically overrides all previous entries.
精彩评论