Using Lucene.Net, what indexing stategy should I use here?
I'm trying to build a search for our internal support database - each support ticket consists of many emails and I'm trying to work out how best to index it:
- Should I create a document for each of the emails individually, or
- Should I concatenate all the emails for a ticket and create a document for each ticket.
When s开发者_JAVA技巧earching I want to return a list of tickets (rather than a list of emails grouped by ticket or anything like that)
Which is best?
If you want list of tickets in results then concatenate emails. Otherwise you need to maintain relations between emails and tickets. You can only do this with textual fields inside of documents. And this maybe slow. But such a relation is possible
If you use search together with relation database indexing emails one by one will be fine. You retrieve e-mails then read tickedId field from lucene document and then read Ticket with this Id from database.
Obviously indexing emails separately is more flexible solution. If in future you will need to retrieve per-email information you can do this. In all-emails-in-one solution you'll have to reindex entire database.
If you want to search on the ticket level, it makes the most sense to combine all the ticket's emails into 1 document.
I don't think there's a definitive "best" answer here. Personally, I'd probably include the text of the emails in the index for the support ticket, since that would let a single index access find support tickets based on both the text of the emails and other properties of the support ticket. This is the sort of thing that is fairly subjective, though, so you might try prototyping different strategies and doing some user-testing.
精彩评论