Most elegant way to break CSV columns into separate data structures using Python?
I'm trying to pick up Python. As part of the learning process I'm porting a project I wrote in Java to Python. I'm at a section now where I have a list of CSV headers of the form:
headers = [a, b, c, d, e, .....]
and separate lists of groups that these headers should be broken up into, e.g.:
headers_for_list_a = [b, c, e, ...]
headers_for_list_b = [a, d, k, ...]
. . .
I want to take the CSV data and turn it into dict's based on these groups, e.g.:
list_a = [
{b:val_1b, c:val_1c, e:val_1e, ... },
{b:val_2b, c:val_2c, e:val_2e, ... },
{b:val_3b, c:val_3c, e:val_3e, ... },
. . .
]
where for example, val_1b is the first row of the 'b' column, val_3c is the third row of the 'c' column, etc.
My first "Java instinct" is to do something like:
for row in data:
for col_num, val in enumerate(row):
col_name = headers[col_num]
if col_name in group_a:
dict_a[col_name] = val
elif headers[col_cum] in group_b:
dict_b[col_name] = val
...
list_a.append(dict_a)
list_b.append(dict_b)
...
However, this method seems inefficient/unwieldy and doesn't posses the elegance that开发者_StackOverflow中文版 Python programmers are constantly talking about. Is there a more "Zen-like" way I should try- keeping with the philosophy of Python?
Try the CSV module of Python, in particular the DictReader class.
csv.DictReader
import csv
groups = dict(a=headers_for_list_a, b=headers_for_list_b)
lists = dict((name, []) for name in groups)
for row in csv.DictReader(csvfile, fieldnames=headers):
for name, grp_headers in groups.items():
lists[name].append(dict((header, row[header]) for header in grp_headers))
Not necessary the most pythonic way to achieve the same thing as your code, but this version of your code is somewhat more concise due to the use of generator expressions:
from itertools import izip
for row in data:
dict_a = dict((col_name, val) for col_name, val in izip(headers, row) \
if col_name in group_a)
dict_b = dict((col_name, val) for col_name, val in izip(headers, row) \
if col_name in group_b)
list_a.append(dict_a)
list_b.append(dict_b)
Also, use sets for group_a
and group_b
instead of lists - the in
operator works faster on sets. But Jason Humber is right, DictReader
is way more elegant, see the following version:
from csv import DictReader
for row in DictReader(your_file, headers):
dict_a = dict((k, row[k]) for k in group_a)
dict_b = dict((k, row[k]) for k in group_b)
list_a.append(dict_a)
list_b.append(dict_b)
精彩评论