A solution to remove the duplicates?

2023-03-08 03:53 问答作者：

My code is below. Basically, I've got a CSV file and a text file "input.txt". I'm trying to create a Python application which will take the input from "input.txt" and search through the CSV file for a match and if a match is found, then it should return the first column of the CS开发者_开发技巧V file.

import csv
csv_file = csv.reader(open('some_csv_file.csv', 'r'), delimiter = ",")
header = csv_file.next()
data = list(csv_file)

input_file = open("input.txt", "r")
lines = input_file.readlines()
for row in lines:
    inputs = row.strip().split(" ")
    for input in inputs:
        input = input.lower()
        for row in data:
            if any(input in terms.lower() for terms in row):
                print row[0]

Say my CSV file looks like this:

 book title, author 
 The Rock, Herry Putter
 Business Economics, Herry Putter    
 Yogurt, Daniel Putter
 Short Story, Rick Pan

And say my input.txt looks like this:

 Herry
 Putter

Therefore when I run my program, it prints:

 The Rock
 Business Economics
 The Rock
 Business Economics
 Yogurt

This is because it searches for all titles with "Herry" first, and then searches all over again for "Putter". So in the end, I have duplicates of the book titles. I'm trying to figure out a way to remove them...so if anyone can help, that would be greatly appreciated.

If original order does not matter, then stick the results into a set first, and then print them out at the end. But, your example is small enough where speed does not matter that much.

Stick the results in a set (which is like a list but only contains unique elements), and print at the end.

Something like;

if any(input in terms.lower() for terms in row):
    if not row[0] in my_set:
        my_set.add(row[0])

During the search stick results into a list, and only add new results to the list after first searching the list to see if the result is already there. Then after the search is done print the list.

First, get the set of search terms you want to look for in a single list. We use set(...) here to eliminate duplicate search terms:

search_terms = set(open("input.txt", "r").read().lower().split())

Next, iterate over the rows in the data table, selecting each one that matches the search terms. Here, I'm preserving the behavior of the original code, in that we search for the case-normalized search term in any column for each row. If you just wanted to search e.g. the author column, then this would need to be tweaked:

results = [row for row in data
              if any(search_term in item.lower()
                     for item in row
                     for search_term in search_terms)]

Finally, print the results.

for row in results:
    print row[0]

If you wanted, you could also list the authors or any other info in the table. E.g.:

for row in results:
    print '%30s (by %s)' % (row[0], row[1])

继续阅读：python

A solution to remove the duplicates?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？