Mysql : Get results from DB knowing that we don't want to get specific ids

2023-03-08 18:17 问答作者：

I'm currently working on a quiz website.

I have a database of approx. 1000 questions - but the DB will grow week after week.

Every day, each user who will play the quiz will get 5 questions randomly.

The problem is I don't want a single user to get the same question 2 times in two different dates.

I'm storing an history of all the answered questions by users, so I can know that user id:1 has answered answer id:4 to question id:6 on date YYYY-MM-DD

basically :

entry_id | user_id | question_id | answer_id | good | date

So when I get 5 questions randomly for the user, I have two options :

SELECT question, question_id FROM questions WHERE question_id != 'X' AND question_id != 'Y' AND question_id != 'Z' ORDER BY RAND() LIMIT 0,5

OR (easier)

SELECT question, question_id FROM questions WHERE question_id NOT IN(X,Y,Z)

My problem :

Let's say my user have already answered 500 questions so far (100 days of activity). My query to get him new questions would be extremely long

... NOT IT({huge list of ids for which the user has already answered})

... question_id != 'A' AND question_id != 'B' and so on and so on.

My concern is that my queries could get extremely slow with time. Imagine a user for whom i must get 5 questions, knowing t开发者_Go百科hat he already answered 5000, the query would kill my server, no?

Is there any way for me to get 5 questions randomly for my user by knowing all the ids of the questions is already answered and by being sure that the query won't be too har to handle for my server ?

Thanks in advance!

NOT EXISTS will probably serve you better in this case.

SELECT q.question, q.question_id
    FROM questions q
    WHERE NOT EXISTS(SELECT NULL
                         FROM answers a
                         WHERE a.question_id = q.question_id
                             AND a.user_id = 'YourUser')
    ORDER BY RAND() LIMIT 0,5

Yes, you can have an equivalent NOT IN query with NOT EXISTS.

MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr. This is done by pushing down an appropriate equality into the subquery's WHERE clause. That is, the comparison is converted to this:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery:

SELECT q.* FROM questions q WHERE 
NOT EXISTS(SELECT 1 FROM answers a
                         WHERE a.question_id = q.question_id
                         AND a.user_id = 'UserId')
ORDER BY RAND() LIMIT 0,5

A 'NOT IN' in the WHERE clause should do the trick:

SELECT
  XYZ
FROM 
  QUESTIONS
WHERE
  ID NOT IN (SELECT ID FROM QUESTION_HISTORY WHERE USER_ID = @USERID)

I suggest you use multiple queries for this as "ORDER BY RAND()" is quite slow for big tables.

First select all possible IDs

SELECT q.question_id
    FROM questions q
    WHERE q.question_id NOT IN
        (    SELECT a.question_id
                 FROM anwered a
                 WHERE a.question_id = q.question_id AND a.user_id = 'userID'
        )

You'd then pick five random elements in your favorite language and make another query

SELECT q.question_id, ...
    FROM questions q
    WHERE q.question_id IN ('id1', 'id2', 'id3', 'id4', 'id5');

I think this should run faster, but it's probably better to benchmark than take a wild guess.

All the suggestions so far involve running rather expensive queries on the database. If you have many users and many questions, you might run into performance trouble. If this is a concern you can opt for storage complexity rather than time complexity:

Warning: premature optimization ahead!

For each user, pre-generate a randomly sorted set of question IDs. Do this in your application code and store it as a blob to the database. Also store, for each user, their position in that list. Now all you have to do is load the list, jump to the right position, and return the relevant questions.

You can use a pseudo-random number generation algorithm, such as Mersenne twister to generate a list of question IDs. For every user, create a different seed so you get a different sequence of questions for different users.

10's of KBs per user required to store a pre-calculated list of 1000's of questions. That doesn't seem too high. However, it does affect your performance since the database has to send all that extra data to your application when you load that field.

This is a far less straightforward solution that the other answers here and is most definitely premature optimization. Still, I thought I should suggest it as an alternative to complex SQL queries.

sub select

SELECT *
FROM questions
WHERE question_id NOT IN (
    SELECT question_id
    FROM answers
    WHERE user = XX
)

Mysql : Get results from DB knowing that we don't want to get specific ids

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？