Lucene.Net and Nhibernate.Search: how to manage complex searches?
We have a SQL Server database with a million-ish records that are indexed by Lucene.net
through Nhibernate.Search
. When we built the index for our classes, we tried to be extensive since the cost for indexing/retrieval was really small. The goal was to offer full-text searching to users on a webpage with pagination.
Since SQL Server complains when too many parameters are sent to it (2100 parameters by default) and since we didn't want to change that parameter everytime we hit the limit (which can happen easily, some terms in our document are very common but must be searchable) we decided to handle everything from sorting to paging in Lucene. It worked like a charm.
However, recently, feature-creep is causing us some problem because new queries need to access not only fields that aren't indexed but also fields that shouldn't be accessed or can't be accessed: computed fields, recommendation lists, etc...
Since we have put all our paging and sorting in Lucene.Net and since SQL Server is picky regarding its parameters, how can we manage to have our cake and eat it too?
I'm looking into doing the sql query computation first, reducing the elements to their doc id and then feeding Lucene a gigantic OR query with all possible ids to let it choose correctly what's possible, but i worry about the query size
pseudo code
listIds = Nhibernate.Criteria.ReduceToIds.List(of MyObject)
queryIds = String.join(" ID:", l)
return NHibernate.Search(queryIds)
Apparently, it is possible to have Lucene Filters working by allowing only certain documents ID to be part of the query, so it should be po开发者_Python百科ssible, but i don't really see a way to do it in Nhibernate.search
Do you have any idea how i should handle the problem? Is it possible to filter the query by asking SQL the list of ids? Is it overkill? Any other solution out there?
Usually you have problems when Lucene.Net returns more than 2100 results, as NHibernate.Search will build a big SELECT * FROM T WHERE ID IN (@p0,@p1 ...)
So, if your lucene query doesn't return more than 2100 results, you should be fine.
精彩评论