开发者

Wondering how Facebook does the "Mutual friends" feature

I'm currently developing an application to allow students to manage their courses, and I don't really know how to design the database for a specific feature. The client wants, a lot like Facebook, that when a student displays the list of people currently in a specific course, the people with the most mutual courses with the logged in user are displayed first. Almost the same as Facebook feature "Friend suggestions" with an additional filter.

As an additional feature, I would like to add a search feature to allow students to search for another one, and displaying first in the search results the people with most mutual courses with the logged in user.

I currently use MySQL开发者_JAVA百科, I plan to use Cassandra for some other features, and I also use Memcached for result caching and Sphinx for the search.

Thanks.

--

The app is developed in Python, BTW

And I forgot to mention that the standard approach (using a nice MySQL query to calculate all this with an ORDER BY clause) is wayyyys too slow. So as reads are a lot more frequent than reads, I would like most of the logic to take place once, when the relation people <-> course is added.

I thought about updating a "mutual courses" counter specific to one tuple (user, course) that will be increased for all users of a course when the logged in user joins a new course (or decreased when he leaves it).


Say you have a table that is named Users and the Primary Key is UserID. Then you have a table called Friends with 2 columns called UserID (PK) and FriendUserID.

Say you have 2 users, 20 and 50.

When 20 adds 50 as friend, the application adds a new row:

INSERT INTO `Friends` (`UserID`, `FriendUserID`) VALUES (20, 50)

and when 50 confirms friendship, you add another row with values switched:

INSERT INTO `Friends` (`UserID`, `FriendUserID`) VALUES (50, 20)

When you want to find mutual friends between 20 and 50, simply:

SELECT `UserID` FROM `Friends` AS `A`, `Friends` AS B WHERE `A`.`FriendUserID` = 20 AND `A`.`UserID` = `B`.`UserID` AND `B`.`FriendUserID` = 50


If you already have your solution, but the problem is just the speed of that query, try doing it sooner. When a user's friendships change, rerun a job that calculates these things and store all the results away. Don't runt his as a result of a request, when you need the result so quickly. Do such expensive things only once and do them before a request is ever made.


I would break this up as (2) queries and find the intersection in Python:

#Query 1 - Get the user's friends
SELECT friend_id FROM friends WHERE user_id = 'my user id'

#Query 2 - Get the users enrolled in the course
SELECT student_id FROM course_enrollment WHERE course_id = 'course id'

Then find the intersection in Python. Then you can let the database do caching, etc ... without any joins to slow things down.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜