开发者

Most elegant way to break CSV columns into separate data structures using Python?

I'm trying to pick up Python. As part of the learning process I'm porting a project I wrote in Java to Python. I'm at a section now where I have a list of CSV headers of the form:

headers = [a, b, c, d, e, .....]

and separate lists of groups that these headers should be broken up into, e.g.:

headers_for_list_a = [b, c, e, ...]
headers_for_list_b = [a, d, k, ...]
. . .

I want to take the CSV data and turn it into dict's based on these groups, e.g.:

list_a = [
          {b:val_1b, c:val_1c, e:val_1e, ... },
          {b:val_2b, c:val_2c, e:val_2e, ... },
          {b:val_3b, c:val_3c, e:val_3e, ... },
          . . . 
         ]

where for example, val_1b is the first row of the 'b' column, val_3c is the third row of the 'c' column, etc.

My first "Java instinct" is to do something like:

for row in data:
    for col_num, val in enumerate(row):
        col_name = headers[col_num]
        if col_name in group_a:
            dict_a[col_name] = val
        elif headers[col_cum] in group_b:
            dict_b[col_name] = val
        ...
    list_a.append(dict_a)
    list_b.append(dict_b)
    ...     

However, this method seems inefficient/unwieldy and doesn't posses the elegance that开发者_StackOverflow中文版 Python programmers are constantly talking about. Is there a more "Zen-like" way I should try- keeping with the philosophy of Python?


Try the CSV module of Python, in particular the DictReader class.


csv.DictReader

import csv

groups = dict(a=headers_for_list_a, b=headers_for_list_b)
lists = dict((name, []) for name in groups)

for row in csv.DictReader(csvfile, fieldnames=headers):
    for name, grp_headers in groups.items():
        lists[name].append(dict((header, row[header]) for header in grp_headers))


Not necessary the most pythonic way to achieve the same thing as your code, but this version of your code is somewhat more concise due to the use of generator expressions:

from itertools import izip

for row in data:
    dict_a = dict((col_name, val) for col_name, val in izip(headers, row) \
                  if col_name in group_a)
    dict_b = dict((col_name, val) for col_name, val in izip(headers, row) \
                  if col_name in group_b)
    list_a.append(dict_a)
    list_b.append(dict_b)

Also, use sets for group_a and group_b instead of lists - the in operator works faster on sets. But Jason Humber is right, DictReader is way more elegant, see the following version:

from csv import DictReader

for row in DictReader(your_file, headers):
    dict_a = dict((k, row[k]) for k in group_a)
    dict_b = dict((k, row[k]) for k in group_b)
    list_a.append(dict_a)
    list_b.append(dict_b)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜