extracting nouns,noun phrases,adjectives verbs from text file corpus using visual c#
i am doing a project wherein i have to extract nouns adjectives noun phrases and ver开发者_高级运维bs from text files(.doc) format. i have a corpus of around 75 such files. i have accessed net to find about it and i came across POS tagging in python using nltk. as my project is in c# (using visual studio 2008) i need a code to do so. i have tried wordnet api for the same and even sharpnlp but as i am a newbie i found these tough to integrate with my project. can anybody please suggest me simpler code to do so using something like vocabulary etc. plz help me guys. thanx.
I worked in NLP (Natural Language Processing) for an industry leader for a while and what you want to do is no trivial task. I know one of the creators of nltk
and I have used it myself; it's a high quality open source tool and I'd recommend you use it (do you have a particularly compelling reason to use C#?)
POS tagging is typically implemented by training a model of language on hand-annotated data, then applying that model to new text, predicting the parts of speech and giving a confidence . nltk
has tools that do this, and they also have some models (if I'm not mistaken).
You'll find that most tools are written in C++, Java, and Python. If you don't know any of the languages look at this as an excellent opportunity to learn something!
See Wikipedia, especially the links at the bottom, for more information and other software available to use for such tagging.
Christopher is correct in his statement that NLP implementations are no picnic. However, I've recently looked into a viable solution using OpenNLP in a .NET project with a rudimentary PoS parser. In my example I am looking for noun phrases, but it shouldn't be too difficult a text to find other fragments as well. I find the OpenNLP Tools Models for 1.5 to be sufficient for my purposes.
I realize this answer is woefully late for the questioner, but hopefully it will give others some inspiration with this difficult field to get into.
Extracting noun phrases with contextual relevance in .NET using OpenNLP
Kindly read through this article.
Easy Integration of SharpNLP with C# Visual Studio Project
In this article, I have given a step by step way of integrating SharpNLP with C# project and have given sample code snippets for specifically address your issue such as Sentence Splitting, tokenizing and POSTagging.
Try this out and I will be able to help you with the problems you encounter.
精彩评论