How to query huge MySQL databases?

2023-02-06 06:45 问答作者：

I have 2 tables, a purchases table and a users table. Records in the purchases table looks like this:

purchase_id | product_ids | customer_id
---------------------------------------
1           | (99)(34)(2) | 3 
2           | (45)(3)(74) | 75

Users table looks like this:

user_id  | email              | password
----------------------------------------
3        | joeShmoe@gmail.com | password 
75       | nolaHue@aol.com    | password

To get the purchase history of a user I use a query like this:

mysql_query(" SELECT * FROM purchases WHERE customer_id = '$users_id' ");

The problem is, what will happen when tens of thousands of records are inserted into the purchases table. I feel like this will take a performance toll.

So I was thinking about storing the purchases in an additional field directly in the user's row:

user_id | email              | password  | purchases
------------------------------------------------------
1       | joeShmoe@gmail.com | password  | (99)(34)(2)
2       | nolaHue@aol.com    | password  | (45)(3)(74)

And when I query the user's table for things like username, etc. I can just as easily grab their purchase history using that one query.

Is this a good idea, will it help better performance or will the benefit be insignificant and not worth making the database look messier?

I really want to know what the pros do in these situations, for example how does amazon query it's database for user's purchase history since they have millions of customers. How come there queries don't take hours?

EDIT

Ok, so I guess keeping them separate is the way to go. Now the question is a design one:

Shoul开发者_开发问答d I keep using the "purchases" table I illustrated earlier. In that design I am separating the product ids of each purchase using parenthesis and using this as the delimiter to tell the ids apart when extracting them via PHP.

Instead should I be storing each product id separately in the "purchases" table so it looks like this?:

purchase_id | product_ids | customer_id
---------------------------------------
1           | 99          | 3 
1           | 34          | 3
1           | 2           | 3
2           | 45          | 75
2           | 3           | 75
2           | 74          | 75

Nope, this is a very, very, very bad idea.

You're breaking first normal form because you don't know how to page through a large data set.

Amazon and Yahoo! and Google bring back (potentially) millions of records - but they only display them to you in chunks of 10 or 25 or 50 at a time.

They're also smart about guessing or calculating which ones are most likely to be of interest to you - they show you those first.

Which purchases in my history am I most likely to be interested in? The most recent ones, of course.

You should consider building these into your design before you violate relational database fundamentals.

Your database already looks messy, since you are storing multiple product_ids in a single field, instead of creating an "association" table like this.

_____product_purchases____
purchase_id | product_id |
--------------------------
          1 |         99 |
          1 |         34 |
          1 |          2 |

You can still fetch it in one query:

SELECT * FROM purchases p LEFT JOIN product_purchases pp USING (purchase_id)
   WHERE purchases.customer_id = $user_id

But this also gives you more possibilities, like finding out how many product #99 were bought, getting a list of all customers that purchased product #34 etc.

And of course don't forget about indexes, that will make all of this much faster.

By doing this with your schema, you will break the entity-relationship of your database.

You might want to look into Memcached, NoSQL, and Redis. These are all tools that will help you improve your query performances, mostly by storing data in the RAM.

For example - run the query once, store it in the Memcache, if the user refresh the page, you get the data from Memcache, not from MySQL, which avoids querying your database a second time.

Hope this helps.

First off, tens of thousands of records is nothing. Unless you're running on a teensy weensy machine with limited ram and harddrive space, a database won't even blink at 100,000 records.

As for storing purchase details in the users table... what happens if a user makes more than one purchase?

MySQL is hugely extensible, and don't let the fact that it's free convince you of otherwise. Keeping the two tables separate is probably best, not only because it keeps the db more normal, but having more indices will speed queries. A 10,000 record database is relatively small in deference to multi-hundred-million record health record databases.

As far as Amazon and Google, they hire hundreds of developers to write specialized query languages for their specific application needs... not something developers like us have the resources to fund.

继续阅读：sql

How to query huge MySQL databases?

EDIT

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

EDIT

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？