MySQL: Suggesting objects (optimizing a multi-join query)
Goal: Su开发者_开发知识库ggest objects based on user's choices
Data: Table containing info on how users would order a subset of objects from the worst to the best; Example:
1 2 3 4 5 6
John: A B G J S O
Mary: A C G L
Joan: B C L J K
Stan: G J C L
There's about 20 times as many users as objects, every user's lineup contains 50-200 objects.
The table:
CREATE TABLE IF NOT EXISTS `pref` (
`usr` int(10) unsigned NOT NULL,
`obj` int(10) unsigned NOT NULL,
`ord` int(10) unsigned NOT NULL,
UNIQUE KEY `u_o` (`usr`,`obj`),
KEY `u` (`usr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Basic idea: Iterate within user's objects starting from the second worst, building pairs (A > B); look for them in other users' lineups and list items better than A according to those users.
Query:
SELECT e.obj, COUNT(e.obj) AS rate
FROM pref a, pref b, pref c, pref d, pref e
WHERE a.usr = '222' # step 1: select a pair of objects A, B, where A is better than B according to user X
AND a.obj = '111'
AND b.usr = a.usr
AND b.ord < a.ord
AND c.obj = a.obj # step 2: find users thinking that object A is better than B
AND d.obj = b.obj
AND d.ord < c.ord
AND d.usr = c.usr
AND e.ord > c.ord # step 3: find objects better than A according to these users
AND e.usr = c.usr
GROUP BY e.obj
ORDER BY rate DESC;
Aliases:
a
object A ('111'), current user ('222')
b
object B, worse than A according to current user (has lower value of 'ord' than A)
c
object A in other user's lineup
d
object B in other user's lineup
e
object better than A in other user's lineup
The execution plan (ouo and uo being the indexes as suggested by Quassnoi):
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| 1 | SIMPLE | a | ref | ouo,uo | ouo | 8 | const,const | 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | b | ref | ouo,uo | uo | 4 | const | 86 | Using where |
| 1 | SIMPLE | d | ref | ouo,uo | ouo | 4 | db.b.obj | 587 | Using index |
| 1 | SIMPLE | c | ref | ouo,uo | ouo | 8 | const,db.d.usr | 1 | Using where; Using index |
| 1 | SIMPLE | e | ref | uo | uo | 4 | db.d.usr | 80 | Using where |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
The query seems to work fine as long as the dataset is not too big; any ideas on how to streamline it to support larger datasets?
The query is fine, just create the following indexes:
pref (obj, usr, ord)
pref (usr, ord)
Update:
Try this syntax.
The rating system is simpler but quite similar: it gives almost same rating on the test random results I created.
SELECT oa.obj, SUM(weight) AS rate
FROM (
SELECT usr, ord,
(
SELECT COUNT(*)
FROM pref a
JOIN pref ob
ON ob.obj = a.obj
WHERE ob.usr = o.usr
AND a.usr = 50
AND a.ord <
(
SELECT ord
FROM pref ai
WHERE ai.usr = 50
AND ai.obj = 75
)
AND ob.ord < o.ord
) AS weight
FROM pref o
WHERE o.obj = 75
HAVING weight >= 0
) ow
JOIN pref oa
ON oa.usr = ow.usr
AND oa.ord > ow.ord
GROUP BY
oa.obj
ORDER BY
rate DESC
This query gives the weight to each item rated higher than A
by all users who rated A
.
The weight is equal to the number of items rated below A
by both users.
精彩评论