开发者

Finding repeats in multiple lists read from CSV File (Python)

Title seems confusing, but let's say I'm working with the following CSV file ('names.csv').

    name1,name2,name3
    Bob,Jane,Joe
    Megan,Tom,Jane
    Jane,Joe,Rob

My question is, how would I go about making code that returns the string that occurs at least 3 times. So the output should be 'Jane', because that occurs at least 3 times. Really confused here.. perhaps some sample code would help me better understand?

So far I have:

    import csv
    reader = csv.DictReader(open("names.csv"))

    for row in reader:
        names = [row['name1'], row['name2'], row['name3']]
        print names

This returns:

    ['Bob', 'Jane', 'Joe']
    ['Megan', 'Tom', 'Jane']
    ['Jane', 'Joe', 'Rob']

Where do I go from here? Or am I going about this wrong? I'm really new to Python (well,开发者_JAVA技巧 programming altogether), so I have close to no clue what I'm doing..

Cheers


Putting it altogether (and showing proper csv.reader usage):

import csv
import collections
d = collections.defaultdict(int)
with open("names.csv", "rb") as f: # Python 3.x: use newline="" instead of "rb"
    reader = csv.reader(f):
    reader.next() # ignore useless heading row
    for row in reader:
        for name in row:
            name = name.strip()
            if name:
                d[name] += 1
 morethan3 = [(name, count) for name, count in d.iteritems() if count >= 3]
 morethan3.sort(key=lambda x: x[1], reverse=True)
 for name, count in morethan3:
    print name, count

Update in response to comment:

You need to read through the whole CSV file whether you use the DictReader approach or not. If you want to e.g. ignore the 'name2' column (not row), then ignore it. You don't need to save all the data as your use of the variable name "rows" suggests. Here is code for a more general approach that doesn't rely on the column headings being in a particular order and allows selection/rejection of particular columns.

    reader = csv.DictReader(f):
    required_columns = ['name1', 'name3'] #### adjust this line as needed ####
    for row in reader:
        for col in required_columns:
            name = row[col].strip()
            if name:
                d[name] += 1


I'd do it like this:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> rows = [['Bob', 'Jane', 'Joe'],
... ['Megan', 'Tom', 'Jane'],
... ['Jane', 'Joe', 'Rob']]
...
>>> for row in rows:
...     for name in row:
...         d[name] += 1
... 
>>> filter(lambda x: x[1] >= 3, d.iteritems())
[('Jane', 3)]

It uses dict with default value of 0 to count how many times each name happens in the file, and then it filters the dict with according condition (count >= 3).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜