Lucene Queries -- joining documents and maintaining relevancy
I am trying to create a Lucene search using school name and player name to return videos. I am trying to decide between two methods.
Method A is to index the school name and player name on the video document and use a boolean query to search on these fields.
Method B is to create 开发者_运维知识库separate document types and make 3 unique queries.
Documents:
- school document - stores a school_id and indexes the school name
- player document - stores a school_id and sport_id, and indexes the player name
The 3 queries:
- Search for all school documents with school name
- Search for all player documents with player name
- Search the videos for all content with school_id and sport_id from the first two queries.
What are the pros/cons of both methods?
You almost certainly want to go with method A. In order to combine relevance scores from two indexes you essentially have to reinvent Lucene.
The downside is that if a school/player changes their name, you have to reindex. That seems pretty unusual though.
Keep in mind here that Lucene is really only good if you have a large amount of free-text to search. If it's just a few words (like the name of a school) using the free-text capabilities of MySQL or your other favorite rdbms will probably be just as fast and will be a lot easier to implement. You won't have the issue with reindexing on rename, for example.
精彩评论