Multiple IndexReader/Writers in one process (Lucene)
We are maintaining a Lucene index which 开发者_如何学Ccontains around 20mm documents. The nature of the search queries is such that indexing and quering can be easily split between different indexes.
To achive that we need to keep many (potentially thousands) of IndexWriters or IndexReaders/Searchers in memory to deal with indexing and quering of each one of these indiceies (the queries do not span across multiple indexes).
I need to know about the memory pressure this is going to cause, and potential solutions anyone can suggest.
You might want to take a look at Solr, which supports the creation and management of multiple indices (called cores) out of the box. It will also handle all the work of distribution over multiple nodes if that becomes necessary.
That being said, the memory overhead per index is very low (by design). I think it's something like one byte per document and then the number of unique terms divided by 256.
I would like to know how often do you update the index, is there a real time requirement? I you are using the java lucene project then you can probably look into this open source project that Linked-In spawned off of some internal work. http://sna-projects.com/zoie/
As far as searching the memory pressure depends on wether you are sorting the results by the value of indexed fields. In this case the field cache which is an internal lucene facility will generate memory pressure in some situations.
I hope this helps.
精彩评论