Can I create a corpus from a collection of strings in NLTK? [duplicate]
Is there a way to create a corpus without having to have items in files. For instance, I want to manipulate Tweets or paragraphs that I am grabbing from the web. Can I do something like
myCorpus = MyCorp开发者_开发问答us([
('id', 'item', 'category'),
('id', 'item', 'category'),
('id', 'item', 'category'),
... ])
Or
myCorpus.add('id', 'item', 'category')
The purpose is to manipulate the corpus with existing NLTK capabilities. I checked TextCollection
but it seems that it doesn't handle categories.
Why not just write the strings out to a file or files and then process them as a corpus?
精彩评论