开发者

Solr searching while indexing

I have problem in optimizing following psedo code any help is appreciated

for every term 
open new index searcher
do search
if found 
skip and search for next term
else
add it to index
commit
close searc开发者_C百科her

In the above code while adding new doc/term to index, I have to commit the changes for just adding a new doc( which I feel costly) to see new changes opening new index searcher next time.

Is there any way I can improve the performance. FYI: I have 36 million terms to be indexed.


You can create a HashSet to de-duplicate your list of terms in memory, then index just those terms. The pseudocode is like so:

set := new HashSet
for each term
  if set contains term
    skip to next iteration
  else
    add term to set
end
open index
for each term in set
  add term to index
end
close index


I suggest you simply create a second index (either in a RAMDirectory or a FSDirectory on a temporary location). Add all those terms/documents that have not been found to the second (temporary) index and merge the two indices at the end.

open index for searching
for every term
  open new index searcher
  do search
  if found 
    skip and search for next term
  else
    add it to the second index
end
close searcher
commit temp index
merge temp index into primary index 
commit primary index
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜