What's the best way to "shuffle" a table of database records?
Say that I have a table with a bunch of records, which I want to randomly present to users. I also want users to be able to paginate back and 开发者_开发技巧forth, so I have to perserve some sort of order, at least for a while.
The application is basically only AJAX and it uses cache for already visited pages, so even if I always served random results, when the user tries to go back, he will get the previous page, because it will load from the local cache.
The problem is, that if I return only random results, there might be some duplicates. Each page contains 6 results, so to prevent this, I'd have to do something like WHERE id NOT IN (1,2,3,4 ...)
where I'd put all the previously loaded IDs.
Huge downside of that solution is that it won't be possible to cache anything on the server side, as every user will request different data.
Alternate solution might be to create another column for ordering the records, and shuffle it every insert time unit here. The problem here is, I'd need to set random number out of a sequence to every record in the table, which would take as many queries as there are records.
I'm using Rails and MySQL if that's of any relevance.
Try this:
mysql> create table t (i int);
mysql> insert into t values (1),(2),(3),(4),(5),(6);
mysql> select * from t order by rand(123) limit 2 offset 0;
+------+
| i |
+------+
| 6 |
| 4 |
+------+
mysql> select * from t order by rand(123) limit 2 offset 2;
+------+
| i |
+------+
| 2 |
| 3 |
+------+
mysql> select * from t order by rand(123) limit 2 offset 4;
+------+
| i |
+------+
| 5 |
| 1 |
+------+
Note that the rand() function has a seed value (123). Note also that if you repeat the last three queries you'll get the same result every time.
If the random results are "for everyone" rather than any specific user, then you can do something like this: (This is for Postgres, should work with others)
update mytable set sortorder = random() * 100000000;
select * from mytable order by sortorder, primarykeyid;
Since random MAY duplicate, the secondary sort by primarykeyid gives the sort some stability.
You can then do this as often as you want to refresh your cache. For example, give you pages an absolute expiration of, say, every minute. Then every minute you reupdate the sort order and serve pages up normally.
If you get requests across the refresh window, then, yea, you have a chance of having different pages getting the same results. You will also have the issue of when they hit "back" they may well not get the page that they had before (since it refreshed).
Kind of comes down to what the motivation behind the presentation of random data is as to how well this will work. It also depends on data volume, etc.
But this is a cache friendly way of pulling this off, if that's important to you. It's also stateless (no session information needed).
I would do the following (assuming a sequential, numeric primary key):
- Generate a random number and store it in the user's session
- When a user pages through the data, query for the total rows
- Use the number stored in the session as the seed to generate the same 'random' order of ids on each request
- Page through the ids and only retrieve the records that match those ids from the database.
精彩评论