开发者

Finding shared list IDs in a MySQL table using bitwise operands

I want to find items in common from the "following_list" column in a table of users:

+----+--------------------+-------------------------------------+
| id | name               | following_list                      |
+----+--------------------+-------------------------------------+
|  9 | User 1             | 26,6,12,10,21,24,19,16              | 
| 10 | User 2             | 21,24                               | 
| 12 | User 3             | 9,20,21,26,30                       | 
| 16 | User 4             | 6,52,9,10                           | 
| 19 | User 5             | 9,10,6,24                           | 
| 21 | User 6             | 9,10,6,12                           | 
| 24 | User 7             | 9,10,6                              | 
| 46 | User 8             | 45                                  | 
| 52 | User 9             | 10,12,16,21,19,20,18,17,23,25,24,22 | 
+----+----------------开发者_C百科----+-------------------------------------+

I was hoping to be able to sort by the number of matches for a given user id. For example, I want to match all users except #9 against #9 to see which of the IDs in the "following_list" column they have in common.

I found a way of doing this through the "SET" datatype and some bit trickery:

http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html#bits

However, I need to do this on an arbitrary list of IDs. I was hoping this could be done entirely through the database, but this is a little out of my league.


EDIT: Thanks for the help everybody. I'm still curious as to whether a bit-based approach could work, but the 3-table join works nicely.

SELECT a.following_id, COUNT( c.following_id ) AS matches
FROM following a
LEFT JOIN following b ON b.user_id = a.following_id
LEFT JOIN following c ON c.user_id = a.user_id
  AND c.following_id = b.following_id
WHERE a.user_id = ?
GROUP BY a.following_id

Now I have to keep convincing myself not to prematurely optimize.


If you normalised your following_list column into a separate table with user_id and follower_id, then you'd find that COUNT() was extremely easy to use. You'd also find the logic for selecting a list of followers, or a list of user's being followed much easier


Your problem would be simplified if you could split your following_list column off into a child table, e.g.

TABLE id_following_list:

id | following
--------------
10 | 21
10 | 24
46 | 45
...| ...

You can read more here.


Normalize the table, drop the column following_list, create a table following:

user_id
following_id

Which leads to the easy-peasy query (untested, you get the point):

SELECT b.user_id, COUNT(c.following)
FROM following a
JOIN following b -- get followings of <id> 
ON b.following_id = a.following_id
AND b.user_id = a.following_id
JOIN following c -- get all (other) followings of <id> again, match with followings of b
ON b.following_id = c.following_id
AND c.user_id = a.user_id
WHERE a.user_id = <id>
GROUP BY b.user_id
ORDER BY COUNT(b.following) DESC

Performance may very well very based on indexes & size of dataset, maybe add a 'similarity' column which is updated at regular intervals or changes just for fast data retrieval.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜