Solr searching while indexing
I have problem in optimizing following psedo code any help is appreciated
for every term
open new index searcher
do search
if found
skip and search for next term
else
add it to index
commit
close searc开发者_C百科her
In the above code while adding new doc/term to index, I have to commit the changes for just adding a new doc( which I feel costly) to see new changes opening new index searcher next time.
Is there any way I can improve the performance. FYI: I have 36 million terms to be indexed.
You can create a HashSet to de-duplicate your list of terms in memory, then index just those terms. The pseudocode is like so:
set := new HashSet
for each term
if set contains term
skip to next iteration
else
add term to set
end
open index
for each term in set
add term to index
end
close index
I suggest you simply create a second index (either in a RAMDirectory or a FSDirectory on a temporary location). Add all those terms/documents that have not been found to the second (temporary) index and merge the two indices at the end.
open index for searching
for every term
open new index searcher
do search
if found
skip and search for next term
else
add it to the second index
end
close searcher
commit temp index
merge temp index into primary index
commit primary index
加载中,请稍侯......
精彩评论