Am I underestimating MySQL?
I'm about to implement a feature on my website that recommends content to users based on the content they already have in their library (a la Last.fm). A single table holds all the records of the content they have added, so a row might look something like:
--------------------
| userid | content |
--------------------
| 28 | a |
--------------------
When I want to recommend some content for a user, I use a query to get all the user id's that have content a added in their library. Then, out of those user id's, I make another query that finds the next most common content among those users (fx. 'b'), and show that to the user.
My problem is when I'm thinking about the big picture here. Say that eventua开发者_运维知识库lly my site will hold something like 500.000 rows in the table, will this make the MySQL response very slow or am I underestimating MySQL here?
You will not know this until you've tested it, so start prototyping.
Typically, 500 000 rows is next to nothing. I worry a bit when my tables reach 50 millions, cause then it takes a while when I have to purge old data - though querying data is still fast.
But this all depends on the kinds of queries you need. Queries spanning all those 50 million rows would indeed be very slow, queries only touching 50k of those 50 millions are fast. And for your problem, you need to measure your queries, tune your queries, tables/indexes and mysql itself.
500 000 won't be an issue if you design proper indexes for your queries. It depends also on your hardware configuration an topology, but in general it should be ok.
The real question should be:
Is it reasonable to assume that MySQL's usage would have died out a long time ago if it couldn't work with an integer index on 500,000 rows of data?
Providing there isn't any RAM or CPU or storage engine constraints it should work fine.
Databases are made for these kinds of queries. You may need to have to adjust disk space and RAM if it's running low, but with the proper indexes you should be safe to go.
500.000 item tables was not something notable in Foxpro times - that is 10 years ago ;) Today it is totally nothing. Just make sure you run appropriate hardware ;)
In short: yes, you are underestimating MySQL (and probably most commercially used relational databases in general). If they hadn't designed it to support more than 500k rows, they probably wouldn't be one of the most popular databases in the world.
I think nos has a good answer, because he correctly points out that you need to stress test your system in order to know its limits; but he also provides a number (50 million -- I can't vouch for that, but it sounds like he's worked on DBs of this size) to at least give you an idea how big the ballpark is.
So: 500k? Doable. 50 billion? Less confident about that one.
More important than the raw number of rows is going to be proper indexing, and (even more so) what kind of queries you write.
Querying against 500k rows is nothing, especially when indexed. On the other hand, if you write a 3-way self-join that requires a full table scan of 500k * 500k * 500k rows, then you'll have a problem.
精彩评论