Finding repeats in multiple lists read from CSV File (Python)
Title seems confusing, but let's say I'm working with the following CSV file ('names.csv').
name1,name2,name3
Bob,Jane,Joe
Megan,Tom,Jane
Jane,Joe,Rob
My question is, how would I go about making code that returns the string that occurs at least 3 times. So the output should be 'Jane', because that occurs at least 3 times. Really confused here.. perhaps some sample code would help me better understand?
So far I have:
import csv
reader = csv.DictReader(open("names.csv"))
for row in reader:
names = [row['name1'], row['name2'], row['name3']]
print names
This returns:
['Bob', 'Jane', 'Joe']
['Megan', 'Tom', 'Jane']
['Jane', 'Joe', 'Rob']
Where do I go from here? Or am I going about this wrong? I'm really new to Python (well,开发者_JAVA技巧 programming altogether), so I have close to no clue what I'm doing..
Cheers
Putting it altogether (and showing proper csv.reader usage):
import csv
import collections
d = collections.defaultdict(int)
with open("names.csv", "rb") as f: # Python 3.x: use newline="" instead of "rb"
reader = csv.reader(f):
reader.next() # ignore useless heading row
for row in reader:
for name in row:
name = name.strip()
if name:
d[name] += 1
morethan3 = [(name, count) for name, count in d.iteritems() if count >= 3]
morethan3.sort(key=lambda x: x[1], reverse=True)
for name, count in morethan3:
print name, count
Update in response to comment:
You need to read through the whole CSV file whether you use the DictReader approach or not. If you want to e.g. ignore the 'name2' column (not row), then ignore it. You don't need to save all the data as your use of the variable name "rows" suggests. Here is code for a more general approach that doesn't rely on the column headings being in a particular order and allows selection/rejection of particular columns.
reader = csv.DictReader(f):
required_columns = ['name1', 'name3'] #### adjust this line as needed ####
for row in reader:
for col in required_columns:
name = row[col].strip()
if name:
d[name] += 1
I'd do it like this:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> rows = [['Bob', 'Jane', 'Joe'],
... ['Megan', 'Tom', 'Jane'],
... ['Jane', 'Joe', 'Rob']]
...
>>> for row in rows:
... for name in row:
... d[name] += 1
...
>>> filter(lambda x: x[1] >= 3, d.iteritems())
[('Jane', 3)]
It uses dict with default value of 0 to count how many times each name happens in the file, and then it filters the dict with according condition (count >= 3).
精彩评论