开发者

How to copy large set of data in SQLServer db

I have a requirement to take a "snapshot" of a current database and clone it into the same database, with new Primary Keys.

The schema in question consists of about 10 tables, but a few of the tables will potentially contain hundreds of thousands to 1 million records that need to be duplicated.

What are my options here?

I'm afraid that writing a SPROC will require a locking of the database rows in question (for concurrency) for the entire duration of the operation, which is quite annoying to other users. How long would such an operation take, assuming that we can optimize it to the full extent sqlserver allows? Is it going to be 30 seconds to 1 minute to perform this many inserts? I'm not able to lock the whole table(s) and do a bulk insert, because there are other users under other accounts that are using the same tables independently.

Depending on performance expectations, an alternative would be to dump the current db into an xml file and then asynchronously clone the db from this xml file at leisure in the background. The obvious advantage of this is that the db is only locked for the time it takes to do the xml dump, and the inserts can run in the background.

If a good DBA can get the "clone" operation to execute start to finish in under 10 seconds开发者_如何学运维, then it's probably not worth the complexity of the xmldump/webservice solution. But if it's a lost cause, and inserting potentially millions of rows is likely to balloon out in time, then I'd rather start out with the xml approach right away.

Or maybe there's an entirely better approach altogether??

Thanks a lot for any insights you can provide.


I would suggest backing the up database, and then restoring it as new db on your server. You can use that new DB as your source. I will definitely recommend against the xml dump idea..


Does it need to be in the exact same tables? You could make a set of "snapshots" tables where all these records go, you would only need a single insert + select, like

insert into snapshots_source1 (user,col1, col2, ..., colN) 
select 'john', col1, col2, ..., colN from source1

and so on.

You can make snapshots_* to have an IDENTITY column that will create the 'new PK' and that can also preserve the old one if you so wished.

This has (almost) no locking issues and looks a lot saner.

It does require a change in the code, but shouldn't be too hard to make the app to point to the snapshots table when appropriate.

This also eases cleaning and maintenance issues

---8<------8<------8<---outdated answer---8<---8<------8<------8<------8<---

Why don't you just take a live backup and do the data manipulation (key changing) on the destination clone?

Now, in general, this snapshot with new primary keys idea sounds suspect. If you want a replica, you have log shipping and cluster service, if you want a copy of the data to generate a 'new app instance' a backup/restore/manipulate process should be enough.

You don't say how much your DB will occupy, but you can certainly backup 20 million rows (800MB?) in about 10 seconds depending on how fast your disk subsystem is...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜