开发者

lucene / solr remove common phrases (stop phrases)

I would like to eliminate from the search query the words/phrases that bring no meaning to the query (we could call them stop phrases). Exampl开发者_开发知识库e:

"How to .."

"Where can I find .."

"What is the meaning of .."

etc.

  1. Where to find / how to compute a list of 'common phrases' for English and for French?

  2. How to implement it in Solr (Is there anything more advanced than the stopwords feature?)


I think that you shouldn't try to completely get rid of these phrases, because they reveal intent of the searcher. You can try to leverage the existence of them by using a natural language question answering system like Ephyra. There is even a project aimed at integration of it with Lucene. I haven't used it myself, but maybe at least evaluating it is worth a try.

If you are determined to remove them, then I think that you need to write custom QueryParser that will filter the query, delegating the further processing to a parser of your choice.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜