开发者

How Grouping can be achieved in Solrnet/Solr(Lucene)?

I have Lucene files indexed according to pageIds (UniqueKey). and one document can have multiple pages. Now once user perform some search it gives us pages that matches search criteria.

I am using Lucene.Net 2.9.2

We have 2 problems...

1- The file size 开发者_如何学Cis around 800GB and it has 130 million rows (pages) so the search time was really slow (all queries taking more than a min (we only have to return limited rows at a time)

To overcome the performance issue I shifted to SOLR which resolved the performance issue (which is quite strange as I am not using any extra functionality provided by SOLR like sharding etc - so could it be that Lucene.NET 2.9.2 is not really equivalent to performance compared to same version of JAVA??) but now I am having another issue...

2- The individual 'lucene document' is one page but i want to show results 'grouped by' 'real documents'. How many results I should be returned should be configurable based on 'real documents' not 'pages' (coz thats how I want to show to the user).

So lets say I want 20 'real documents' and ALL pages in them that matches the search criteria (doesnt matter if one document has 100 pages and another just 1).

From what I could get from SOLR forums was that it can be achieved by SOLR-236 patch (field collapsing) but I have not been able to apply the patch correctly with trunk (gives lots of errors).

This is really imp for me and I dont have much time, so can someone please either send me the SOLR 1.4.1 binary with this patch applied or guide me if there is any other way.

I would really appreciate it. Thanks!!


If you have issues with the collapse patch, then the Solr issue tracker is the channel to report them. I can see that other people are currently having some issues with it, so I suggest getting involved in its development.

That said: I recommend that if your application needs to search for 'real documents', then build your index around these 'real documents', not their individual pages.


If your only requirement is to show page numbers, I would suggest to play with the highlighter or made some custom development. You can store the word number of the start and end of each page in a custom structure, and knowing the matched word position in the whole document you can know in what page it appears. If the documents are very large you will get a good performance improvement.


You could also have a look at SOLR-1682 : Implement CollapseComponent, I havent tested it yet, but as far as I know it solves the collapsing too.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜