开发者

Deleting words that appear multiple times in a file

How can I delete words that appear mult开发者_如何转开发iple times in a file and just keep the first one and delete the clones.


A simple algorithm is to just iterate over all words in the input, adding each one to a set of words you've seen before. If the word was already in the set, remove it.

Here's an example:

seen_words = set()
for word in words:
    if word not in seen_words:
        print word
        seen_words.add(word)


You can also use a dictionary like this:

mydict = {}
mylist = [1, 2, 2, 3, 4, 5, 5]
for item in mylist:
  mydict[item] = ""
for item in mydict:
  print item

Output:

1
2
3
4
5

But of course you would need to integrate that into file reading/writing.


You can use a set:

set('these are all the words the words all are these'.split())

output: 'these', 'the', 'all', 'are', 'words'


fileText = "some words with duplicate words"
fileWords = fileText.split(" ")
output = fileWords[0]
words = [output]
for word in fileWords:
    if word not in words:
        output += " "+word
        words.append(word)


If your file is not EXTREMELY big,

word='word'
data=open("file").read()
ind = data.find(word)
print data[:ind+len(word)] + data[ind:].replace(word,"")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜