Fastest way to fetch a subset (200M) from a very large table (600M) in SQL Server
we are facing the following problem and we are trying to come up with the best possible solution.
We are using SQL Server 2008. We have a table that has more than 600 millions records, and has about 25 columns. One of the columns is an ID and is indexed. We need to get a subset of records from this table. There are mainly 2 cases:
a) the subset contains anywhere between 1000 and 200 millions rows ; the IDs of the rows to get are stored in an indexed table in SQL Server;
b) the subset contains less than 1 million rows ; the IDs of the rows to get are stored in the application's server's memory (.NET web app).
Our challenge is to get this subset as fast as possible. We do need this to happen in a few seconds.
Our solutions so far:
a) we do a join between the two tables. That works but that is not fast enough. The query looks something like SELECT * FROM Big_开发者_Python百科Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID
.
b) we do not really have a solution. We tried to run a WHERE IN query but that simply takes forever if the subset approaches the million rows (SELECT * FROM Big_Table WHERE ID IN (ID1, ID2, ID3...)
). A solution could be to store the indexes in a temporary table but then this falls back to case a.
We are trying to optimize SQL Server as much as we can, notably using good indexes and partioning. I am more asking today of what you think the best approach is to extract the subset in both cases (a and b).
Any idea or suggestion is appreciated.
Thanks
Since you're doing pagination, couldn't you subselect a range of rows from the IndexTable first, then join this to the big table?
Something like:
select * from big_table where id in (select top 100 id from indextable)
The most obvious question for your problem a) (the SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID
query) is: are you sure to have indices on the ID
column in both tables?
精彩评论