开发者

How do I find text features and print them?

I have just started using Natural Language Toolkit (NLTK) as a part of my Engineering college project. Can anybody please tell me how do I read an input paragraph text and

1) break it down into textual components i.e into number of sentences, number of words, number of characters and number of polysyllabic or complex words in the given paragraph

and

2) Also 开发者_开发知识库print the above determined values


Where's the input paragraph coming from? File? Console? That's more of a python issue than NLTK.

For the rest, look at the nltk.tokenize module & nltk.probability.FreqDist.


From a discussion on the NLTK google group:

import curses 
from curses.ascii import isdigit 
import nltk 
from nltk.corpus import cmudict

d = cmudict.dict() 

def nsyl(word): 
  return [len(list(y for y in x if isdigit(y[-1]))) for x in d[word.lower()]] 

This should be able to give you a syllable count for each word. Hope this helps.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜