In python, how to parse a file into lists based on a specific value?

2023-03-20 06:32 问答作者：

I have a large tab delimited text file, for example, call it john_file:

1 john1 23 54 54
2 john2 34 45 66
3 john3 35 43 54
4 john2 34 54 78开发者_StackOverflow社区

5 john1 12 34 65
6 john3 34 55 66

What's a quick way to parse this file into 3 lists based on name(john1, 2 or 3)?

fh=open('john_file.txt','r').readlines()
john1_list=[]
for i in fh:
 if i.split('\t')[1] == "john1":
  john1_list.append(i)

Thanks in advance

from collections import defaultdict

d = defaultdict(list)

with open('john_file.txt') as f:
    for line in f:
        fields = line.split('\t')
        d[fields[1]].append(line)

The individual lists are then in d['john1'], d['john2'] etc

>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> for line in '''1 john1 23 54 54
... 2 john2 34 45 66
... 3 john3 35 43 54
... 4 john2 34 54 78
... 5 john1 12 34 65
... 6 john3 34 55 66
... '''.split('\n'):
...  data = filter(None, line.split())
...  if data:
...   a[data[1]].append(data)
... 
>>> data
[]
>>> a
defaultdict(<type 'list'>, {'john1': [['1', 'john1', '23', '54', '54'], ['5', 'john1', '12', '34', '65']], 'john2': [['2', 'john2', '34', '45', '66'], ['4', 'john2', '34', '54', '78']], 'john3': [['3', 'john3', '35', '43', '54'], ['6', 'john3', '34', '55', '66']]})

You could do something like:

fh=open('john_file.txt','r').readlines()
john_lists={}
for i in fh:
    j=i.split('\t')[1]
    if j not in johns:
        john_lists[j]=[]
    johns[j].append(i)

This has the advantage of not depending on knowing in advance the possible values in the second column.

As others point out, you can also use the defaultdict to do

from collections import defaultdict
fh=open('john_file.txt','r').readlines()
john_lists=defaultdict(list)
for i in fh:
    j=i.split('\t')[1]
    johns[j].append(i)

littletable makes this kind of simple slicing and dicing easy, making a list of objects accessible/queryable/pivotable by attribute, like a mini-in-memory database, but with even less overhead than SQLite.

from collections import namedtuple
from littletable import Table

data = """\
 1 john1 23 54 54
 2 john2 34 45 66
 3 john3 35 43 54
 4 john2 34 54 78
 5 john1 12 34 65
 6 john3 34 55 66"""

Record = namedtuple("Record", "id name length width height")
def makeRecord(s):
    s = s.strip().split()
    # convert all but name to ints, and build a Record instance
    return Record(*(ss if i == 1 else int(ss) for i,ss in enumerate(s)))

# create a table and load it up 
# (if this were CSV data, would be even simpler)
t = Table("data")
t.create_index("id", unique=True)
t.create_index("name")
t.insert_many(map(makeRecord, data.splitlines()))

# get a record by unique key 
# (unique indexes return just the single record)
print t.id[4]
print

# get all records matching an indexed value 
# (non-unique index retrievals return a new Table)
for d in t.name['john1']:
    print d
print

# dump summary pivot tables
t.pivot('name').dump_counts()
print

t.create_index('length')
t.pivot('name length').dump_counts()

Prints:

Record(id=4, name='john2', length=34, width=54, height=78)

Record(id=1, name='john1', length=23, width=54, height=54)
Record(id=5, name='john1', length=12, width=34, height=65)

Pivot: name
john1       2
john2       2
john3       2

Pivot: name,length
           12      23      34      35   Total
john1       1       1       0       0       2
john2       0       0       2       0       2
john3       0       0       1       1       2
Total       1       1       3       1       6

继续阅读：file list parsing python

In python, how to parse a file into lists based on a specific value?

更多精彩内容

精彩评论

最新问答

大家觉得三星电视怎么样?？

电动幕布挂不平会不会有皱纹？

海信激光电视视距是多少,客厅大小怎么匹配?？

如何打开屏幕镜像？

检查输卵管堵了哪家医院好？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？