开发者

database schema of "don't show me again what I saw before" in mysql

I'm making a website that shows you images. And special feature of site is "don't show me again what I saw before". It means, if you see a image, it goes to your "archive" category. There will be so many images and categories. And I need to very smooth schema of database to perfomance.

When you click a image, it appears on lightbox and in the lightbox code it sends request with ajax to make this image archived just for you.

Is that database schema above performanceful for about 5.000 images and 20.000 users?

users
  user_id
  user_email

pictures
  picture_id
  picture_url
  tags

archived
  user_id
  picture_id

images will appear on front of you with excepting archived images for you 开发者_如何学运维from all images on this schema...


This is a diificult question to answer without knowing all the details. You mention how many users and images there will be. How many images will each user (on average) have in their archived list? If that number is small, the archived table won't approach 100M rows.

100M rows should not be a problem by itself, as the database can handle this. The concern may (or will) be with the way you are going to want to query the data. Something like:

SELECT 
  * 
FROM 
  picture
WHERE 
  picture_id NOT IN 
  (
     SELECT picture_id FROM archived WHERE user_id = [userIdParameter]
  )

That will likely not perform very well with 100M rows.

Another option would be to cross join users and pictures so that the archived table always contains a Cartesian product. So the table would be:

archived
  user_id
  picture_id
  visited 

Then you could query like so:

SELECT
  p.*
FROM
  picture p
  INNER JOIN archived a ON p.picture_id = a.picture_id
WHERE 
  a.user_id = [userIdParameter]
  AND a.visited = [false]

This should perform acceptably with proper indexing, but would present the problem of having to make sure rows are created in the archived table any time a user or picture is added to the system. It also means you would always have a number of rows equal to pictures * users (100M in your example). That may not be desirable in your case.

Bottom line, you are going to have to create some test data that approximates your expected volume and do some performance testing that approximates your load. If you think this is the critical potential performance bottleneck for your system, it will be worth the time investment.


I used "NOT IN" solution for a while and there is performance problems started. Because I don't have a strong server to execute that query with lot of datas.

So, I found the most performanceful answer : "Collection Shuffle"

I'm shuffleing the collection with a userid seed and saving just users last image index id. After user comes back, looking to where this user's index id left lastly, showing next id from his collection.

This is really light and exactly solution. Thanks for everyone :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜