I have a dataset from which I would like to remove 开发者_Python百科stop words. I used NLTK to get a list of stop words:
I have some code that removes stop words from my data set, as the stop list doesn\'t seem to remove a majority of the words I would like it too, I\'m looking to add words to this stop list so that it
I have some code that gives me a list of words with their frequencies that they occur in the text, I\'m looking to make it so the code converts the top 10 words automatically into an ARFF with
I have some code that processes a dataset for later use, the code i\'m using for the stop words seems to be ok, however I think the problem lies within the rest of my code as it seems to only开发者_St
Ok I edited my question since I now have a host that does support ssh. How can i install the nltk module for python using ssh?
I am building a spam filter using the NLTK in Python. I now check for the occurances of words and use the NaiveBayesClassifier resulting in an accuracy of .98 and F measure for spam of .92 and for non
Can someone recommend an open source POS tagger for Korean, Indonesian, Thai and Vietnamese? That I can use to tag the corpus dat开发者_开发问答a that I currently have. (e.g. the stanford-postagger)
OK I am using different taggers to tag a text. Default, unigram, bigram and trigram. I have to check which combination of three of those four taggers is the most accurate.
Trying to write simple python script which will use NLTK to find and replace synonyms in txt file. Following code gives me error:
I have written the following code to count the number of sentences, words and characters in the input file sample.txt, which contains a paragraph of text. It works fine in giving the number of sentenc