SOLR and Natural Language Parsing - Can I use it?

2023-01-01 16:04 问答作者：

Requir开发者_JAVA技巧ements

Word frequency algorithm for natural language processing

Using Solr

While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP.

I thought of SOLR because:

It's got a bunch of tokenizers and performs a lot of NLP.
It's pretty use to use out of the box.
It's restful distributed app, so it's easy to hook up
I've spent some time with it, so using could save me time.

Can I use Solr?

Although the above reasons are good, I don't know SOLR THAT well, so I need to know if it would be appropriate for my requirements.

Ideal Usage

Ideally, I'd like to configure SOLR, and then be able to send SOLR some text, and retrieve the indexed tonkenized content.

Context

I'm working on a small component of a bigger recommendation engine.

I guess you can use Solr and combine it with other tools. Tokenization, stop word removal, stemming, and even synonyms come out of the box with Solr. If you need named entity recognition or base noun-phrase extraction, you need to use OpenNLP or an equivalent tool as a pre-processing stage. You will probably need term vectors for your retrieval purposes. Integrating Apache Mahout with Apache Lucene and Solr may be useful as it discusses Lucene and Solr integration with a machine learning (including recommendation) engine. Other then that, feel free to ask further more specific questions.

You can actually configure Solr to use NLP algorithms both when indexing documents and at search time. The first phase (indexing time) can be done using/writing Solr UpdateRequestProcessor plugins for analyzing fields texts while the second phase can be implemented writing a custom QParserPlugin which analyzes the query hit by the user. I've presented an approach for implementing natural language search in Solr at Lucene Eurocon 2011 which takes advantage of Apache UIMA for running (open source) NLP algorithms. You can take a look at the slides and at the video of the talk. Hope this helps. Tommaso

There is a special request handler designed to apply parsing to filter our less relevant search results. It is based on machine learning of constituency parse trees obtained by OpenNLP.

Please see the blog http://search-engineering.blogspot.com

and the paper http://dx.doi.org/10.1016/j.datak.2012.07.003

This SOLR search request handler will be available as a part of OpenNLP Similarity component

In this Google code project

http://code.google.com/p/relevance-based-on-parse-trees

you can use the linguistic-based request handler in the package opennlp.tools.similarity.apps.solr public class SyntGenRequestHandler extends SearchHandler

where the search results obtained by SearchHandler are re-ranked based on similarity of parse trees.

继续阅读：lucene recommendation-engine solr

SOLR and Natural Language Parsing - Can I use it?

Requir开发者_JAVA技巧ements

Using Solr

Can I use Solr?

Ideal Usage

Context

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Requir开发者_JAVA技巧ements

Using Solr

Can I use Solr?

Ideal Usage

Context

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？