开发者

Efficient query to lookup stuff in a word index

I have two tables defined like this:

Page(id), Index(page_id, word)

page_id in Index is a foreign key to Page so that each Page is connected to a group of Index entries. The Index table is a index for the Page table so that you can do fast text searching. E.g:

SELECT page_id FROM Index where word = 'hello'

Would select all page_id's for all pages containing the word 'hello'. But now I want to select all page_id's for pages that contain all of the words 'word1', 'word2' and 'word3'. The best query I can come up with for this is:

SELECT page_id
FROM Index
WHERE word IN ('word1', 'word2', 'word3')
GROUP BY page_id
HAVING COUNT(1) = 3;

It works, but I wonder if someone can think of an alternative more efficient query?

[edit] The above example is开发者_开发问答 slightly simplified. In the real Index table word is replaced with a word_id column that references a Word table. But the basic approach is the same. The RDBMS is PostgreSQL and there is about 2m rows in the Index table and 20k rows in Page.


As a small thing, I wouldn't call a table Index, that's bound to be confusing :)

Your query would find pages with 3 index entries for word1. That might not be a problem, but you could change the having to:

HAVING COUNT(DISTINCT word) = 3

to avoid this problem.

How this query performs compared to other depends on a lot of factors, like keyword density, the DBMS you use, and the total amount of rows. I would not worry about it until you encountered an actual performance problem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜