开发者

Mysql : Get results from DB knowing that we don't want to get specific ids

I'm currently working on a quiz website.

I have a database of approx. 1000 questions - but the DB will grow week after week.

Every day, each user who will play the quiz will get 5 questions randomly.

The problem is I don't want a single user to get the same question 2 times in two different dates.

I'm storing an history of all the answered questions by users, so I can know that user id:1 has answered answer id:4 to question id:6 on date YYYY-MM-DD

basically :

entry_id | user_id | question_id | answer_id | good | date

So when I get 5 questions randomly for the user, I have two options :

SELECT question, question_id FROM questions WHERE question_id != 'X' AND question_id != 'Y' AND question_id != 'Z' ORDER BY RAND() LIMIT 0,5

OR (easier)

SELECT question, question_id FROM questions WHERE question_id NOT IN(X,Y,Z)

My problem :

Let's say my user have already answered 500 questions so far (100 days of activity). My query to get him new questions would be extremely long

... NOT IT({huge list of ids for which the user has already answered})

or

... question_id != 'A' AND question_id != 'B' and so on and so on.

My concern is that my queries could get extremely slow with time. Imagine a user for whom i must get 5 questions, knowing t开发者_Go百科hat he already answered 5000, the query would kill my server, no?

Is there any way for me to get 5 questions randomly for my user by knowing all the ids of the questions is already answered and by being sure that the query won't be too har to handle for my server ?

Thanks in advance!


NOT EXISTS will probably serve you better in this case.

SELECT q.question, q.question_id
    FROM questions q
    WHERE NOT EXISTS(SELECT NULL
                         FROM answers a
                         WHERE a.question_id = q.question_id
                             AND a.user_id = 'YourUser')
    ORDER BY RAND() LIMIT 0,5


Yes, you can have an equivalent NOT IN query with NOT EXISTS.

MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr. This is done by pushing down an appropriate equality into the subquery's WHERE clause. That is, the comparison is converted to this:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery:

SELECT q.* FROM questions q WHERE 
NOT EXISTS(SELECT 1 FROM answers a
                         WHERE a.question_id = q.question_id
                         AND a.user_id = 'UserId')
ORDER BY RAND() LIMIT 0,5


A 'NOT IN' in the WHERE clause should do the trick:

SELECT
  XYZ
FROM 
  QUESTIONS
WHERE
  ID NOT IN (SELECT ID FROM QUESTION_HISTORY WHERE USER_ID = @USERID)


I suggest you use multiple queries for this as "ORDER BY RAND()" is quite slow for big tables.

First select all possible IDs

SELECT q.question_id
    FROM questions q
    WHERE q.question_id NOT IN
        (    SELECT a.question_id
                 FROM anwered a
                 WHERE a.question_id = q.question_id AND a.user_id = 'userID'
        )

You'd then pick five random elements in your favorite language and make another query

SELECT q.question_id, ...
    FROM questions q
    WHERE q.question_id IN ('id1', 'id2', 'id3', 'id4', 'id5');

I think this should run faster, but it's probably better to benchmark than take a wild guess.


All the suggestions so far involve running rather expensive queries on the database. If you have many users and many questions, you might run into performance trouble. If this is a concern you can opt for storage complexity rather than time complexity:

Warning: premature optimization ahead!

For each user, pre-generate a randomly sorted set of question IDs. Do this in your application code and store it as a blob to the database. Also store, for each user, their position in that list. Now all you have to do is load the list, jump to the right position, and return the relevant questions.

You can use a pseudo-random number generation algorithm, such as Mersenne twister to generate a list of question IDs. For every user, create a different seed so you get a different sequence of questions for different users.

10's of KBs per user required to store a pre-calculated list of 1000's of questions. That doesn't seem too high. However, it does affect your performance since the database has to send all that extra data to your application when you load that field.

This is a far less straightforward solution that the other answers here and is most definitely premature optimization. Still, I thought I should suggest it as an alternative to complex SQL queries.


sub select

SELECT *
FROM questions
WHERE question_id NOT IN (
    SELECT question_id
    FROM answers
    WHERE user = XX
)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜