Fulltext search on many tables
I have three tables, all of which have a column with a fulltext index. The user will enter search terms into a single text box, and then all three tables will be searched.
This is better explained with an example:
documents
doc_id
name FULLTEXT
table2
id
doc_id
a_field FULLTEXT
table3
id
doc_id
another_field FULLTEXT
(I realise this looks stupid but that's because I've removed all the other fields and tables to simplify it).
So basically I want to do a fulltext search on name
, a_field
and another_field
, and then show the results as a list of documents
, preferably with what caused that document to be found, e.g. if an开发者_JS百科other_field
matched, I would display what another_field
is.
I began working on a system whereby three fulltext search queries are performed and the results inserted into a table with a structure like:
search_results
table_name
row_id
score
(This could later be made to cache results for a few days with e.g. a hash of the search terms).
This idea has two problems. The first is that the same document can be in the search results up to three times with different scores. Instead of that, if the search term is matched in two tables, it should have one result, but a higher score.
The second is that parsing the results is difficult. I want to display a list of documents, but I don't immediately know the doc_id
without a join of some kind; however the table to join to is dependant on the table_name
column, and I'm not sure how to accomplish that.
Wanting to search multiple related tables like this must be a common thing, so I guess what I'm asking is am I approaching this in the right way? Can someone tell me the best way of doing it please.
I would create a denormalized single index. Ie, put all three document types into a single table with fields for doc_id, doc_type and a single fulltext block. Then you can search all three document types at once.
You might also find that Lucene would make sense in this situation. It gives you faster searching, as well as much more functionality around how the searching and scoring works.
The downside is that you're keeping a separate denomalized copy of the text for each record. The upside is that searching is much faster.
精彩评论