开发者

PHP query analyzation

I'm making a (self-dubbed) knowledge engine, where the user types in a question and online encyclopedias are searched, then returned with a simple answer. How can a query be broken apart into parts of speech using PHP so the subject of the question can be identified? Say, for instance, the example query was, "Who is the British Prime Minister?" Obviously, Who is a pronoun, is is an auxiliary verb, the is an article (so it could probably be ignored and the sentence would still make sense), and British Prime Minister would be the main query I'm supposing. Thanks for开发者_StackOverflow社区 helping!


You should be looking at POS Taggers (Part-of-speech), google for it. One such tagger is Stanford NLP tagger (Natural Language Processing group) http://nlp.stanford.edu/software/tagger.shtml


This isn't really that hard to do from scratch as you are doing information queries, not issuing commands. They key will be in breaking up the phrase properly.

Identify if there is an interrogative pronoun ('who' in your example), which will come at the beginning of the sentence. Don't confuse this with a relative pronoun, which would come later. Take this out of the query, and use it as a second order refiner.

The subject is british prime minister, which is what you'd do your core search on, using the interrogative as a subselector (who, what, where, etc).

If there is a relative pronoun, that can be used as either a second-order selector or grouped in the main selector.

You can just dump stopwords like articles.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜