Finding shared list IDs in a MySQL table using bitwise operands
I want to find items in common from the "following_list" column in a table of users:
+----+--------------------+-------------------------------------+
| id | name | following_list |
+----+--------------------+-------------------------------------+
| 9 | User 1 | 26,6,12,10,21,24,19,16 |
| 10 | User 2 | 21,24 |
| 12 | User 3 | 9,20,21,26,30 |
| 16 | User 4 | 6,52,9,10 |
| 19 | User 5 | 9,10,6,24 |
| 21 | User 6 | 9,10,6,12 |
| 24 | User 7 | 9,10,6 |
| 46 | User 8 | 45 |
| 52 | User 9 | 10,12,16,21,19,20,18,17,23,25,24,22 |
+----+----------------开发者_C百科----+-------------------------------------+
I was hoping to be able to sort by the number of matches for a given user id. For example, I want to match all users except #9 against #9 to see which of the IDs in the "following_list" column they have in common.
I found a way of doing this through the "SET" datatype and some bit trickery:
http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html#bitsHowever, I need to do this on an arbitrary list of IDs. I was hoping this could be done entirely through the database, but this is a little out of my league.
EDIT: Thanks for the help everybody. I'm still curious as to whether a bit-based approach could work, but the 3-table join works nicely.
SELECT a.following_id, COUNT( c.following_id ) AS matches
FROM following a
LEFT JOIN following b ON b.user_id = a.following_id
LEFT JOIN following c ON c.user_id = a.user_id
AND c.following_id = b.following_id
WHERE a.user_id = ?
GROUP BY a.following_id
Now I have to keep convincing myself not to prematurely optimize.
If you normalised your following_list column into a separate table with user_id and follower_id, then you'd find that COUNT() was extremely easy to use. You'd also find the logic for selecting a list of followers, or a list of user's being followed much easier
Your problem would be simplified if you could split your following_list
column off into a child table, e.g.
TABLE id_following_list:
id | following
--------------
10 | 21
10 | 24
46 | 45
...| ...
You can read more here.
Normalize the table, drop the column following_list
, create a table following
:
user_id
following_id
Which leads to the easy-peasy query (untested, you get the point):
SELECT b.user_id, COUNT(c.following)
FROM following a
JOIN following b -- get followings of <id>
ON b.following_id = a.following_id
AND b.user_id = a.following_id
JOIN following c -- get all (other) followings of <id> again, match with followings of b
ON b.following_id = c.following_id
AND c.user_id = a.user_id
WHERE a.user_id = <id>
GROUP BY b.user_id
ORDER BY COUNT(b.following) DESC
Performance may very well very based on indexes & size of dataset, maybe add a 'similarity' column which is updated at regular intervals or changes just for fast data retrieval.
精彩评论