开发者

What's the best way to find percentage match among users' answers to questions?

I have a dataset of users' answers to a predefined list of true/false questions. The data looks like this:

+---------+-------------+--------+----+
| user_id | question_id | answer | id |
+-------------------------------------+
|    4    |     110     |    0   | 1  |
|    4    |     111     |    1   | 2  |
|    4    |     112     |    1   | 3  |
|    4    |     113     |    0   | 4  |
|---------+-------------+--------+----|
|    6    |     110     |    0   | 5  |
|    6    |     111     |    1   | 6  |
|    6    |     112     |    0   | 7  |
|    6    |     113     |    0   | 8  |
+---------+-------------+--------+----|

What I need to find are the top 10 best m开发者_运维问答atches for each user (run once for every user in the system). So to be able to find 10 other users in descending order of best match based on answers (i.e. from the example above, user 4 and user 6 are 75% compatible based on their answers).

A couple of constraints on this to hopefully make it easier are:

  1. Each user will have a minimum of 10 answers to be considered
  2. Everyone has answered the same first 10 questions

Ideally this should be able to handle people who have answered many various questions that might not be the same for everyone (i.e. they skip questions they don't want to answer.

Thanks for any help on this! I'm really at a loss for what to do.


My first though is to use an IF. Something like:

SELECT SUM(IF(a.answer=b.answer,1,0)) AS match, b.user_id 
FROM data_table AS a
JOIN data_table AS b ON a.question_id = b.question_id
WHERE a.user_id = n
AND b.user_id <> n
GROUP BY b.user_id
ORDER BY match DESC
LIMIT 10

Where n is the user_id you wish to test

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜