A solution to remove the duplicates?
My code is below. Basically, I've got a CSV file and a text file "input.txt". I'm trying to create a Python application which will take the input from "input.txt" and search through the CSV file for a match and if a match is found, then it should return the first column of the CS开发者_开发技巧V file.
import csv
csv_file = csv.reader(open('some_csv_file.csv', 'r'), delimiter = ",")
header = csv_file.next()
data = list(csv_file)
input_file = open("input.txt", "r")
lines = input_file.readlines()
for row in lines:
inputs = row.strip().split(" ")
for input in inputs:
input = input.lower()
for row in data:
if any(input in terms.lower() for terms in row):
print row[0]
Say my CSV file looks like this:
book title, author
The Rock, Herry Putter
Business Economics, Herry Putter
Yogurt, Daniel Putter
Short Story, Rick Pan
And say my input.txt looks like this:
Herry
Putter
Therefore when I run my program, it prints:
The Rock
Business Economics
The Rock
Business Economics
Yogurt
This is because it searches for all titles with "Herry" first, and then searches all over again for "Putter". So in the end, I have duplicates of the book titles. I'm trying to figure out a way to remove them...so if anyone can help, that would be greatly appreciated.
If original order does not matter, then stick the results into a set first, and then print them out at the end. But, your example is small enough where speed does not matter that much.
Stick the results in a set (which is like a list but only contains unique elements), and print at the end.
Something like;
if any(input in terms.lower() for terms in row):
if not row[0] in my_set:
my_set.add(row[0])
During the search stick results into a list, and only add new results to the list after first searching the list to see if the result is already there. Then after the search is done print the list.
First, get the set of search terms you want to look for in a single list. We use set(...)
here to eliminate duplicate search terms:
search_terms = set(open("input.txt", "r").read().lower().split())
Next, iterate over the rows in the data table, selecting each one that matches the search terms. Here, I'm preserving the behavior of the original code, in that we search for the case-normalized search term in any column for each row. If you just wanted to search e.g. the author column, then this would need to be tweaked:
results = [row for row in data
if any(search_term in item.lower()
for item in row
for search_term in search_terms)]
Finally, print the results.
for row in results:
print row[0]
If you wanted, you could also list the authors or any other info in the table. E.g.:
for row in results:
print '%30s (by %s)' % (row[0], row[1])
精彩评论