Finding shared list IDs in a MySQL table using bitwise operands

2023-01-02 19:22 问答作者：

I want to find items in common from the "following_list" column in a table of users:

+----+--------------------+-------------------------------------+
| id | name               | following_list                      |
+----+--------------------+-------------------------------------+
|  9 | User 1             | 26,6,12,10,21,24,19,16              | 
| 10 | User 2             | 21,24                               | 
| 12 | User 3             | 9,20,21,26,30                       | 
| 16 | User 4             | 6,52,9,10                           | 
| 19 | User 5             | 9,10,6,24                           | 
| 21 | User 6             | 9,10,6,12                           | 
| 24 | User 7             | 9,10,6                              | 
| 46 | User 8             | 45                                  | 
| 52 | User 9             | 10,12,16,21,19,20,18,17,23,25,24,22 | 
+----+----------------开发者_C百科----+-------------------------------------+

I was hoping to be able to sort by the number of matches for a given user id. For example, I want to match all users except #9 against #9 to see which of the IDs in the "following_list" column they have in common.

I found a way of doing this through the "SET" datatype and some bit trickery:

http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html#bits

However, I need to do this on an arbitrary list of IDs. I was hoping this could be done entirely through the database, but this is a little out of my league.

EDIT: Thanks for the help everybody. I'm still curious as to whether a bit-based approach could work, but the 3-table join works nicely.

SELECT a.following_id, COUNT( c.following_id ) AS matches
FROM following a
LEFT JOIN following b ON b.user_id = a.following_id
LEFT JOIN following c ON c.user_id = a.user_id
  AND c.following_id = b.following_id
WHERE a.user_id = ?
GROUP BY a.following_id

Now I have to keep convincing myself not to prematurely optimize.

If you normalised your following_list column into a separate table with user_id and follower_id, then you'd find that COUNT() was extremely easy to use. You'd also find the logic for selecting a list of followers, or a list of user's being followed much easier

Your problem would be simplified if you could split your following_list column off into a child table, e.g.

TABLE id_following_list:

id | following
--------------
10 | 21
10 | 24
46 | 45
...| ...

You can read more here.

Normalize the table, drop the column following_list, create a table following:

user_id
following_id

Which leads to the easy-peasy query (untested, you get the point):

SELECT b.user_id, COUNT(c.following)
FROM following a
JOIN following b -- get followings of <id> 
ON b.following_id = a.following_id
AND b.user_id = a.following_id
JOIN following c -- get all (other) followings of <id> again, match with followings of b
ON b.following_id = c.following_id
AND c.user_id = a.user_id
WHERE a.user_id = <id>
GROUP BY b.user_id
ORDER BY COUNT(b.following) DESC

Performance may very well very based on indexes & size of dataset, maybe add a 'similarity' column which is updated at regular intervals or changes just for fast data retrieval.

继续阅读：bit-manipulation

Finding shared list IDs in a MySQL table using bitwise operands

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？