开发者

A solution to remove the duplicates?

My code is below. Basically, I've got a CSV file and a text file "input.txt". I'm trying to create a Python application which will take the input from "input.txt" and search through the CSV file for a match and if a match is found, then it should return the first column of the CS开发者_开发技巧V file.

import csv
csv_file = csv.reader(open('some_csv_file.csv', 'r'), delimiter = ",")
header = csv_file.next()
data = list(csv_file)

input_file = open("input.txt", "r")
lines = input_file.readlines()
for row in lines:
    inputs = row.strip().split(" ")
    for input in inputs:
        input = input.lower()
        for row in data:
            if any(input in terms.lower() for terms in row):
                print row[0]

Say my CSV file looks like this:

 book title, author 
 The Rock, Herry Putter
 Business Economics, Herry Putter    
 Yogurt, Daniel Putter
 Short Story, Rick Pan

And say my input.txt looks like this:

 Herry
 Putter

Therefore when I run my program, it prints:

 The Rock
 Business Economics
 The Rock
 Business Economics
 Yogurt

This is because it searches for all titles with "Herry" first, and then searches all over again for "Putter". So in the end, I have duplicates of the book titles. I'm trying to figure out a way to remove them...so if anyone can help, that would be greatly appreciated.


If original order does not matter, then stick the results into a set first, and then print them out at the end. But, your example is small enough where speed does not matter that much.


Stick the results in a set (which is like a list but only contains unique elements), and print at the end.

Something like;

if any(input in terms.lower() for terms in row):
    if not row[0] in my_set:
        my_set.add(row[0])


During the search stick results into a list, and only add new results to the list after first searching the list to see if the result is already there. Then after the search is done print the list.


First, get the set of search terms you want to look for in a single list. We use set(...) here to eliminate duplicate search terms:

search_terms = set(open("input.txt", "r").read().lower().split())

Next, iterate over the rows in the data table, selecting each one that matches the search terms. Here, I'm preserving the behavior of the original code, in that we search for the case-normalized search term in any column for each row. If you just wanted to search e.g. the author column, then this would need to be tweaked:

results = [row for row in data
              if any(search_term in item.lower()
                     for item in row
                     for search_term in search_terms)]

Finally, print the results.

for row in results:
    print row[0]

If you wanted, you could also list the authors or any other info in the table. E.g.:

for row in results:
    print '%30s (by %s)' % (row[0], row[1])
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜