Index strategy for tagged documents where tags can change often
In addition to text content my docume开发者_如何学Pythonnts have tags which can be searched too. The problem now is that the tags change quite often and every time a tag gets added or removed I have to call UpdateDocument which is quite slow when done for hundreds of documents.
Are there any well performing strategies for storing tags that change often and need to be searched with Lucene? I have been thinking about keeping the tags in separate documents to keep them smaller but I can't figure out how to quickly search for tags AND content.
Store [tag, UID] pairs in a relational database. Every time a tag is added or updated, it is added and updated in this table in the database.
When performing a Lucene search that includes both tag data (stored in a database) and content (indexed in Lucene) you will need to merge the results together. One way you can do this is to:
- Make a database query to pull up all the UID's for the tag in question
- Translate all the UID's to Lucene doc ID's and set a bit in a BitSet for every matching Lucene doc ID
- Create a Filter that wraps the BitSet, and pass that filter in to your search.
We implemented this approach in our system, and it works well. You might need to put a cache in front of the database for performance reasons, though. The particulars of step (3) will vary depending on which version of Lucene you're using.
精彩评论