Exporting and importing selected rows in a database
Let's say I have a database like this:
Users
-----
ID int PK Identity
Name vchar(max)
Sales
-----
UserID int FK
Description vchar(max)
And some data:
Users
1, "Daniel"
2, "Barnie"
3, "Frank"
Sales
2, "New computer"
2, "Rubber duck"
3, "Cabbage"
There are also several other tables that link to this primary key. Now there is a requirement that only certain users want to be backed up, for example I only want to export the data and all the linked data for users 2 and 3.
Questions: 1) Is there a way to create a .bak file using only partial data? I don't want to backup the whole thing, just selected records. 2) If .bak files are not the best way, what else can be done? I have thought of generating a csv file or an INSERT sql script but these leads to problems in the import feature. The problem comes about when you have exported from two or more databases and you now have potential clashes in the pr开发者_如何学编程imary key for the users table. How do you get around this? I am also using filestreaming in some tables so I have some data that cannot be pulled out into text format easily.
I'd also like to do all this programatically. Using sql server 2008.
Questions: 1) Is there a way to create a .bak file using only partial data? I don't want to backup the whole thing, just selected records.
No. In SQL Server, the backup functionality will only backup an entire database.
2) If .bak files are not the best way, what else can be done?
I'd recommend setting up a second archive database on the same server as the original and use replication to sync only certain records. I would then only backup the archive database (or both but on different schedules).
If replication isn't your flavor of vodka, then you could even do a triggered upsert or delete into this archive database.
I have thought of generating a csv file or an INSERT sql script but these leads to problems in the import feature. The problem comes about when you have exported from two or more databases and you now have potential clashes in the primary key for the users table. How do you get around this? I am also using filestreaming in some tables so I have some data that cannot be pulled out into text format easily.
Is this a multi-tenant situation? Regardless, for each database I would create a second archive database that would be used to backup the information that was actually needed. Thus, no two databases would feed into the same filtered archive database.
A relatively simple option would be to populate a table with the users you want to back up, make another DB on the same server instance for the archived users, then do a select into the new DB (users table first obviously). Delete from the old DB where values exist in the new DB, back up the new DB, and you are golden.
You can use partitioning to divide the data between either different file groups or different servers. You can then choose how you backup each partition by applying different backup schedules to the filegroups/servers.
But on the whole, storage is very cheap these days. Unless you really know this is going to cost, I would just backup everything. The more complex the backup system, the more prone to failure, and the cost of saving a few gig will not equal the cost of losing all the data!
A backup is meant to be able to take a snapshot of a database so that it can be restored at another time. If you do not care about certain records then delete them, and then do the backup.
What about this idea:
- Create backups tables for each related table you want to backup.
- With a simple query populate this tables (you'll just select data concerning users who wants the backup)
- A simple trigger on each concerned table (add or update) to make your backup tables synchronized.
- Now You can export a back up from this new tables
- To restore data use
insert ignore
.
It's just an idea, let's criticize :)
I would take a generalized archiving approach to this problem:
- Create a central database with all the schemas for all tables you need to export
Since you want to maintain the Filestream data, I don't see how .csv or bcp files could be used. Plus, this fits into the idea you mentioned of having one giant database to accumulate the information.
- For each table, add a new column called DbName.
DbName will be the name of the database that the original record comes from. You can combine this column with the User ID to create a composite key. This will allow you to keep the identity fields in your tables and still be able to merge users with the same ID into one table.
- Create a stored procedure which will load the necessary data into the new database and delete it at the same time (or at least mark for delete).
Presumably this stored procedure will be run as a SQL Agent job and you could have one in each database. The users to be removed could be referred to via a centralized table.
Another possibility is to use SSIS with custom select statements and raw data outputs. The content would be extremely fast importing and exporting in a native format, while getting exactly what records you want. Additionally, you could run compression on the files after export or run file commands to move them around.
精彩评论