开发者

Lucene synonym expansion,stemming,spell check and more

I am using Lucene to index my database and then perform a phrase search on a specific field(field name: keyword). I am using following code currently:

        String userQuery = request.getParameter("query");
        //create standard analyzer object
        analyzer = new StandardAnalyzer(Version.LUCENE_30);
                    Analyzer analyze=AnalyzerUtil.getPorterStemmerAnalyzer(analyzer);
        //create File object of our index directory
        File file = new File(LUCENE_INDEX_DIRECTORY);
        //create index reader object
        reader = IndexReader.open(FSDirectory.open(file),true);
        //create index searcher object
        searcher = new IndexSearcher(reader);
        //create topscore document collector
        collector = TopScoreDocCollector.create(1000, false);
        //create query parser object
        parser = new QueryParser(Version.LUCENE_30,"keyword", analyze);
                    parser.setAllowLeadingWildcard(true);
        //parse the query and get reference to Query object
        query = parser.parse(userQuery);
        //********Line 1***********************
                    //search the query
        searcher.search(query, collector);
        hits = collector.topDocs().sc开发者_Go百科oreDocs;
        //check whether the search returns any result
        if(hits.length>0){//Code to retrieve hits}

This code works fine for stemming, but now I want to also expand my query to do synonym search like if I enter "Man" and my lucene index has a entry "male", it would still be able to give me that as a hit. I tried to add this at Line 1 in the above code query=SynExpand.expand(userQuery,

searcher, analyze,"keyword",serialVersionUID); But it doesn't give me any result. I also want to introduce spell check, where in if I enter "ubelievable" instead of "unbelievable" it would still give me a result.

I have no idea why synonym expansion isn't working for me and how to do spelling check.Please if someone could guide me I will be really grateful.

Thanks!


Fuzzy search may be done by query keyword modifier, namely by adding tilde:

keyword:ubelievable~

See Lucene Parser Syntax for more details and other types of queries that may be interesting to you.

There are 2 ways of dealing with synonyms. Query expansion you are trying to use relies on WordNet. As SynExpand's documentation says, you should first invoke Syns2Index to use expansion. This is easy way, but it works only with English words.

If you need to add support for multiple languages or add your own synonyms, you can use synonym injection during indexing. The idea is to write your own analyzer that will inject synonyms from your own dictionary into indexed documents. This may sound hard to implement, but fortunately there's excellent example in Lucene in Action book (source code is available for free, see lia.analysis.synonym package. Though, I highly recommend to get your copy of this nice book).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜