Lucene Indexing
I would like to use Lucene for indexing a table in an existing database. I have been thinking the process is like:
- Create a 'Field' for every column in the table
- Store all the Fields 开发者_StackOverflow中文版
- 'ANALYZE' all the Fields except for the Field with the primary key
- Store each row in the table as a Lucene Document.
While most of the columns in this table are small in size, one is huge. This column is also the one containing the bulk of the data on which searches will be performed.
I know Lucene provides an option to not store a Field. I was thinking of two solutions:
- Store the field regardless of the size and if a hit is found for a search, fetch the appropriate Field from Document
- Don't store the Field and if a hit is found for a search, query the data base to get the relevant information out
I realize there may not be a one size fits all answer ...
For sure, your system will be more responsive if you store everything on Lucene. Stored field does not affect the query time, it will only make the size of your index bigger. And probably not that bigger if it is only a small portion of the rows that have a lot of data. So if the index size is not an issue for your system, I would go with that.
I strongly disagree with a Pascal's answer. Index size can have major impact on search performance. The main reasons are:
- stored fields increase index size. It could be problem with relatively slow I/O system;
- stored fields are all loaded when you load Document in memory. This could be good stress for the GC
- stored fields are likely to impact reader reopen time.
The final answer, of course, it depends. If the original data is already stored somewhere else, it's good practice to retrieve it from original data store.
When adding a row from the database to Lucene, you can judge if it actually needed to be write to the inverted-index. If not, you can use Index.NOT to avoid writing too much data to the inverted-index. Meanwhile, you can judge where a column will be queried by key-value. If not, you needn't use Store.YES to store the data.
精彩评论