What is the most scalable design for this table structure
DataColumn, DataColumn, DateColumn
Every so often we put data into the table via date.
So everything seems great at first, but then I thought: What happens when there are a million or billion rows in 开发者_如何学Cthe table? Should I be breaking up the tables by date? This way the query performance will never degrade? How do people deal with this sort of thing?
You can use partitioned tables starting with SQL 2K5: Partitioned Tables
This way you gain the benefits of keeping the logical design pure while being able to move old data into a different file group.
You should not break your tables because of data. Instead, you should worry about your indexes, normalization and so on.
Update
A little deeper explanation. Let's suppose you have a table with a million records. If you have different dates on [DateColumn], your greatest ally will be the indexes that work with the [DateColumn]. Then you make sure your queries always filter by at least [DateColumn].
This way, you will be fine.
This easily qualifies as premature optimization, which is tough to achieve in db design IMHO, because optimization is/should be closer to the surface in data modeling.
But all you need to do is create an index on the DateColumn
field. An index is actually a much better performance solution than any kind of table splitting/breaking up and keeps your design and therefore all of you programming much simpler. (And you can decide to use partitioning w/o affecting your design in the future if it helps.)
Sounds like you could use a history table. If you are mostly going to query the current date's data, then migrate the old data to the history table and your main table will not grow so much.
If I understand you question correctly, you have a table with some data and a date. Your question is -- will I see improved performance if I make a new table say, every year. This way the queries will never have to look at more than one years worth of data.
This is wrong. Instead what you should do is set the date field as an index. The server will be able to give you the performance gain you need if it is an index.
If you don't do this your program's logic will get crazy and ultimately slow down your system.
Keep it simple.
(NB - There are some advanced partitioning features you can make use of, but these can be layered in later if needed -- it is unlikely you will need these features but the simple design should be able to migrate to them if needed.)
When tables and indexes become very large, partitioning can help by partitioning the data into smaller, more manageable sections.
Microsoft SQL Server 2005 allows you to partition your tables based on specific data usage patterns using defined ranges or lists. SQL Server 2005 also offers numerous options for the long-term management of partitioned tables and indexes by the addition of features designed around the new table and index structure.
Furthermore, if a large table exists on a system with multiple CPUs, partitioning the table can lead to better performance through parallel operations.
You might need considering the following too: In SQL Server 2005, related tables (such as Order and OrderDetails tables) that are partitioned to the same partitioning key and the same partitioning function are said to be aligned. When the optimizer detects that two partitioned and aligned tables are joined, SQL Server 2005 can join the data that resides on the same partitions first and then combine the results. This allows SQL Server 2005 to more effectively use multiple-CPU computers.
Read about Partitioned Tables and Indexes in SQL Server 2005
精彩评论