How many records can i store in a Sql server table before it's getting ugly?
i've been asked to do some performance tests for a new system. It is only just running with a few client, but as they expect to grow, these are the numbers i work with for my test:
200 clients, 4 years of data, and the data changes per.... 5 minutes. So for every 5 minutes for every client there is 1 record. That means 365*24*12 = 105.000 records per client per year, that means 80 milion records for my test. It has one FK to another table, one PK (uniqueidentifier) and one index on the clientID.
Is this something SqlServer laughs about bec开发者_C百科ause it isn't scaring him, is this getting too much for one quad core 8 GB machine, is this on the edge, or.....
Has anybody had any experience with these kind of numbers?
Field PK should be as small as possible and not be random - GUID sucks here. The main problems are:
- The PK is used in all foreign keys to reference the row, so a large PK uses more space ?= more IO.
- A random PK means inserts happen all over the place = many page splits = inefficient index usage.
How bad is that? I know in some scenarios you lose 80% speed there.
Otherwise - no problem. I have a table in excess to 800 million rows, and things are super fast there ;) Naturally you need to have decent queries, decent indices and obviously that does not run on a single 5400 RPM green hard disc to be efficient - but given proper IO and not stupid queries and some decent indices, SQL does not balk on a couple of billion rows.
So, while "it depends", the generic answer is that large tables are not a problem... ...unless you do MASS deletes. Deleting half the table will be a HUGE transaction, which is why partitioning is nice for stuff like accounting - one partition table per year means I can get rid of a year data without a DELETE statement ;)
The software can handle it, can your server? Well, it depends.
Are you just archiving it? Sure get a hundred billion rows, it doesn't care, the performance problems come in then you're querying the data. The larger it gets, the more space you need in terms of total storage and for operations (deletes, rollback segments, etc) on that data, preferably in memory but on a fast tempdb drive if not.
What matters more to a SQL server than processor on large data sets (though processor certainly affects the time it takes, not the threshold of query/data it can handle) is memory and space (both HD and RAM since it'll overflow to TempDB for large operations), this is speaking in terms of capacity. For performance you need disk IO, memory and processor power all together.
The short answer to can it handle it yes, provided you have enough space. Does it handle it fast enough? That's depends on what kind of queries you're running and how much performance matters.
One last thing, don't forget to take a look at other questions here on optimizing large tables.
SQL Server will have no trouble storing this many records.
If you have designed your indexes properly, and your database is properly normalized, you will have absolutely no problem accessing an arbitrary number of records. Oftentimes, people make poor design decisions early on when their database has no information in it, and you never know about it, because everything is fast for small "n".
So while I will say that SQL Server can handle what you're doing, I would also say that now would be a good time to sit back and take a look at how your queries are performing with SQL Server Profiler. Is everything still fast? Do you see excessive scanning or hashing in your frequent queries which is leading to degraded performance? If so, now's the time to analyze and fix those issues.
As an aside, people really like to think of size limitations based on number of rows and columns. Try to go a step further and talk about bytes, because ultimately, bytes are what are being scanned in a reporting query, and bytes are what are being stored to disk.
Too many really. I'm responsible for a web site which has 2 million registered users.
Some of our tables has more than 100 million records and we can achieve great performance with 4 million daily page views, but I must admit caching with a good architecture is the main reason that things aren't getting ugly.
If you are after ultimate high performance, I'd design the PK to not be a uniqueidentifier. If you need to merge data sets, I'd go with an INT IDENTITY + SMALLINT (or even a tinyint) to determine the originating location. You don't say much about your design, but there are issues trying to use uniqueidentifier as a clustered index.
Given the proper server hardware, most decent designs will do just fine. Don't plan on running anything except the OS and SQL Server on the server. The main concern is RAM, for best performance, you'll need enough RAM for the entire database, indicies, etc., and that is beyond what the OS will use up. I've even seen massive servers help bad designs run very good.
SQL Server can handle terrabytes worth of data. The kicker is that you have design correctly and have the right equipment mix. You might need partitioning for instance. You definitely do need to think about every millisecond of performance on every query and avoid poorly performing designs and querying techniques like EAV tables and correlated subqueries and cursors and "like '%sometext%'".
If you expect your datbase to be that large, then buy and read cover to cover a book on performance tuning before starting the design. Bad design kills database performance and it is extremely hard to correct once you have 80,000,000 records.
I also suggest you find a dba with experience with high performance, high volume databases. This is a whole new game design wise and it needs to be thoguht out from the beginning.
Good for you for doing this kind of testing now before the system has that number of records.
Even MS Access can laugh at a half million row table (depending on row size).
If you don't have any queries to profile, think of the table as a file. The rows aren't the important number compared to the sp_spaceused
.
If you do have some queries, think of the table as a data structure. How can the query be fulfilled with the minimum amount of IO. Use the query plan, and SET STATISTICS IO ON
精彩评论