Replacing synonyms in a corpus using WordNet and NLTK - python
Trying to write simple python script which will use NLTK to find and replace synonyms in txt file.
Following code gives me error:
Traceback (most recent call last):
File "C:\Users\Nedim\Documents\sinon2.py", line 21, in <module>
change(word)
File "C:\Users\Nedim\Documents\sinon2.py", line 4, in change
synonym = wn.synset(word + ".n.01").lemma_names
TypeError: can only concatenate list (not "str") to list
Here is code:
from nltk.corpus import wordnet as wn
def change(word):
synonym = wn.synset(word + ".n.01").lemma_names
if word in synonym:
filename = open("C:/Users/tester/Desktop/test.txt").read()
writeSynonym = filename.replace(str(word), str(synonym[0]))
f = open("C:/Users/tester/Desktop/test.t开发者_开发问答xt", 'w')
f.write(writeSynonym)
f.close()
f = open("C:/Users/tester/Desktop/test.txt")
lines = f.readlines()
for i in range(len(lines)):
word = lines[i].split()
change(word)
This isn't terribly efficient, and this would not replace a single synonym. because there could be multiple synonyms for each word. Which you could chose from,
from nltk.corpus import wordnet as wn
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
corpus_root = 'C://Users//tester//Desktop//'
wordlists = PlaintextCorpusReader(corpus_root, '.*')
for word in wordlists.words('test.txt'):
synonymList = set()
wordNetSynset = wn.synsets(word)
for synSet in wordNetSynset:
for synWords in synSet.lemma_names:
synonymList.add(synWords)
print synonymList
Two things. First, you can change the file reading portion to:
for line in open("C:/Users/tester/Desktop/test.txt"):
word = line.split()
And second, .split()
returns a list of strings, whereas your change
function appears to only operate on a single word at a time. This is what's causing the exception. Your word
is actually a list.
If you want to process every word on that line, make it look like:
for line in open("C:/Users/tester/Desktop/test.txt"):
words = line.split()
for word in words:
change(word)
精彩评论