Deleting words that appear multiple times in a file
How can I delete words that appear mult开发者_如何转开发iple times in a file and just keep the first one and delete the clones.
A simple algorithm is to just iterate over all words in the input, adding each one to a set of words you've seen before. If the word was already in the set, remove it.
Here's an example:
seen_words = set()
for word in words:
if word not in seen_words:
print word
seen_words.add(word)
You can also use a dictionary like this:
mydict = {}
mylist = [1, 2, 2, 3, 4, 5, 5]
for item in mylist:
mydict[item] = ""
for item in mydict:
print item
Output:
1
2
3
4
5
But of course you would need to integrate that into file reading/writing.
You can use a set:
set('these are all the words the words all are these'.split())
output: 'these', 'the', 'all', 'are', 'words'
fileText = "some words with duplicate words"
fileWords = fileText.split(" ")
output = fileWords[0]
words = [output]
for word in fileWords:
if word not in words:
output += " "+word
words.append(word)
If your file is not EXTREMELY big,
word='word'
data=open("file").read()
ind = data.find(word)
print data[:ind+len(word)] + data[ind:].replace(word,"")
精彩评论