SQL for cases with 2 or more hits
I have this table that has a lot of book related fields including keywords. Book_ids are primary keys. Let's say I have a book with six keywords describing it, now how do I search all the other books that have two or more same keywords?
What I have now is roughly this:
select book_id
from book_fields
where keyword in ( select keyword from book_fields where book_id='123' )
The purpose is to get rough recommendations of similar books based on keywords. I'd like to do this with SQL if possible.
To clarify: The idea is to match a book to other b开发者_JAVA百科ooks using more than one keyword. All the keywords of all books were in book_fields table. No chance to touch the schema and RDMS was Oracle.
BEGIN;
CREATE SCHEMA books;
SET search_path TO books;
CREATE TABLE book_fields (
book_id INT NOT NULL
, keyword VARCHAR(30) NOT NULL
, PRIMARY KEY (book_id, keyword)
);
INSERT INTO book_fields (book_id, keyword)
VALUES
(10, 'foo')
, (10, 'bar')
, (10, 'baz')
, (20, 'foo')
, (20, 'xxx')
, (20, 'baz')
, (30, 'yyy')
, (30, 'zzz')
;
SELECT
lhs.book_id AS thisbook
, rhs.book_id AS otherbook
, COUNT(rhs.keyword)
FROM book_fields lhs, book_fields rhs
WHERE lhs.book_id <> rhs.book_id
AND lhs.keyword = rhs.keyword
GROUP BY lhs.book_id, rhs.book_id
;
ROLLBACK;
Assuming you have a one-to-many table, KeywordsToBooks, having the following schema:
BookID int
Keyword varchar
with both columns as PK, BookID as a FK to your Books table, and passing @BookToMatchID as the ID of the Book you want to match on as a parameter, here's what I'd do:
SELECT BookID, COUNT(BookID) AS KeywordMatches
FROM KeywordsToBooks
WHERE BookID <> @BookToMatchID AND Keyword IN (
SELECT Keyword
FROM KeywordsToBooks
WHERE BookID = @BookToMatchID)
GROUP BY BookID
HAVING COUNT(BookID) >= 2
As other's suggested, if this doesn't help, can you post the relevant bits of your schema please?
Assuming that books are stored in the table named books
:
SELECT *
FROM books bo
WHERE (
SELECT 1
FROM book_fields bf
JOIN book_fields br
ON bf.keyword = br.keyword
WHERE br.book_id = 123
AND bf.book_id = bo.id
LIMIT 1, 1
) = 1
For better performance, do roughly what ristonj suggests, but avoid using the IN clause:
SELECT book_id, COUNT(*)
FROM book_fields b
WHERE EXISTS (SELECT 1
FROM book_fields a
WHERE a.keyword = b.keyword
AND a.book_id = '123')
GROUP BY book_id
HAVING COUNT(*) >= 2
This problem can be better addressed by creating a table which holds all keywords.
Is it feasible?
精彩评论