开发者

In python, how to parse a file into lists based on a specific value?

I have a large tab delimited text file, for example, call it john_file:

1 john1 23 54 54

2 john2 34 45 66

3 john3 35 43 54

4 john2 34 54 78开发者_StackOverflow社区

5 john1 12 34 65

6 john3 34 55 66

What's a quick way to parse this file into 3 lists based on name(john1, 2 or 3)?

fh=open('john_file.txt','r').readlines()
john1_list=[]
for i in fh:
 if i.split('\t')[1] == "john1":
  john1_list.append(i)

Thanks in advance


from collections import defaultdict

d = defaultdict(list)

with open('john_file.txt') as f:
    for line in f:
        fields = line.split('\t')
        d[fields[1]].append(line)

The individual lists are then in d['john1'], d['john2'] etc


>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> for line in '''1 john1 23 54 54
... 2 john2 34 45 66
... 3 john3 35 43 54
... 4 john2 34 54 78
... 5 john1 12 34 65
... 6 john3 34 55 66
... '''.split('\n'):
...  data = filter(None, line.split())
...  if data:
...   a[data[1]].append(data)
... 
>>> data
[]
>>> a
defaultdict(<type 'list'>, {'john1': [['1', 'john1', '23', '54', '54'], ['5', 'john1', '12', '34', '65']], 'john2': [['2', 'john2', '34', '45', '66'], ['4', 'john2', '34', '54', '78']], 'john3': [['3', 'john3', '35', '43', '54'], ['6', 'john3', '34', '55', '66']]})


You could do something like:

fh=open('john_file.txt','r').readlines()
john_lists={}
for i in fh:
    j=i.split('\t')[1]
    if j not in johns:
        john_lists[j]=[]
    johns[j].append(i)

This has the advantage of not depending on knowing in advance the possible values in the second column.

As others point out, you can also use the defaultdict to do

from collections import defaultdict
fh=open('john_file.txt','r').readlines()
john_lists=defaultdict(list)
for i in fh:
    j=i.split('\t')[1]
    johns[j].append(i)


littletable makes this kind of simple slicing and dicing easy, making a list of objects accessible/queryable/pivotable by attribute, like a mini-in-memory database, but with even less overhead than SQLite.

from collections import namedtuple
from littletable import Table

data = """\
 1 john1 23 54 54
 2 john2 34 45 66
 3 john3 35 43 54
 4 john2 34 54 78
 5 john1 12 34 65
 6 john3 34 55 66"""

Record = namedtuple("Record", "id name length width height")
def makeRecord(s):
    s = s.strip().split()
    # convert all but name to ints, and build a Record instance
    return Record(*(ss if i == 1 else int(ss) for i,ss in enumerate(s)))

# create a table and load it up 
# (if this were CSV data, would be even simpler)
t = Table("data")
t.create_index("id", unique=True)
t.create_index("name")
t.insert_many(map(makeRecord, data.splitlines()))

# get a record by unique key 
# (unique indexes return just the single record)
print t.id[4]
print

# get all records matching an indexed value 
# (non-unique index retrievals return a new Table)
for d in t.name['john1']:
    print d
print

# dump summary pivot tables
t.pivot('name').dump_counts()
print

t.create_index('length')
t.pivot('name length').dump_counts()

Prints:

Record(id=4, name='john2', length=34, width=54, height=78)

Record(id=1, name='john1', length=23, width=54, height=54)
Record(id=5, name='john1', length=12, width=34, height=65)

Pivot: name
john1       2
john2       2
john3       2

Pivot: name,length
           12      23      34      35   Total
john1       1       1       0       0       2
john2       0       0       2       0       2
john3       0       0       1       1       2
Total       1       1       3       1       6
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜