Finding repeats in multiple lists read from CSV File (Python)

2023-03-04 03:14 问答作者：

Title seems confusing, but let's say I'm working with the following CSV file ('names.csv').

    name1,name2,name3
    Bob,Jane,Joe
    Megan,Tom,Jane
    Jane,Joe,Rob

My question is, how would I go about making code that returns the string that occurs at least 3 times. So the output should be 'Jane', because that occurs at least 3 times. Really confused here.. perhaps some sample code would help me better understand?

So far I have:

    import csv
    reader = csv.DictReader(open("names.csv"))

    for row in reader:
        names = [row['name1'], row['name2'], row['name3']]
        print names

This returns:

    ['Bob', 'Jane', 'Joe']
    ['Megan', 'Tom', 'Jane']
    ['Jane', 'Joe', 'Rob']

Where do I go from here? Or am I going about this wrong? I'm really new to Python (well,开发者_JAVA技巧 programming altogether), so I have close to no clue what I'm doing..

Cheers

Putting it altogether (and showing proper csv.reader usage):

import csv
import collections
d = collections.defaultdict(int)
with open("names.csv", "rb") as f: # Python 3.x: use newline="" instead of "rb"
    reader = csv.reader(f):
    reader.next() # ignore useless heading row
    for row in reader:
        for name in row:
            name = name.strip()
            if name:
                d[name] += 1
 morethan3 = [(name, count) for name, count in d.iteritems() if count >= 3]
 morethan3.sort(key=lambda x: x[1], reverse=True)
 for name, count in morethan3:
    print name, count

Update in response to comment:

You need to read through the whole CSV file whether you use the DictReader approach or not. If you want to e.g. ignore the 'name2' column (not row), then ignore it. You don't need to save all the data as your use of the variable name "rows" suggests. Here is code for a more general approach that doesn't rely on the column headings being in a particular order and allows selection/rejection of particular columns.

    reader = csv.DictReader(f):
    required_columns = ['name1', 'name3'] #### adjust this line as needed ####
    for row in reader:
        for col in required_columns:
            name = row[col].strip()
            if name:
                d[name] += 1

I'd do it like this:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> rows = [['Bob', 'Jane', 'Joe'],
... ['Megan', 'Tom', 'Jane'],
... ['Jane', 'Joe', 'Rob']]
...
>>> for row in rows:
...     for name in row:
...         d[name] += 1
... 
>>> filter(lambda x: x[1] >= 3, d.iteritems())
[('Jane', 3)]

It uses dict with default value of 0 to count how many times each name happens in the file, and then it filters the dict with according condition (count >= 3).

继续阅读：csv python

Finding repeats in multiple lists read from CSV File (Python)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？