开发者

Comparing SQL Server databases and automatic mapping

This is more of a "Would it be possible" question...

If I have 2 databases, 1 database is for an old system (we'll call it v1.0) and the other database is for the new system (v2.0).

Now let's imagine that there has been a data migration between v1.0 and v2.0 but there is no mapping tables or documentation (for reasons I won't go into) and we need to know the integrity of the data.

Is it possible to write a script that could automatically map the tables in ea开发者_高级运维ch database to a level of inspecting the data and deciding if it's a possible match and produce a table with the results?

So you end up with something like:

Source...........|Possible matches  
v1.0.............|v2.0............|v2.0............|v2.0  
table234.columnA |table109.columnX|table63.columnY |table10.columnZ  
table234.columnB |table139.columnX|table13.columnY |table140.columnZ  

So you'd end up with a source table/column and all the possible tables/columns that it could potentially be.

I know this is a bit of an odd one off request but anyone ever need to do anything like this?


Red Gate Data Compare does the data comparison job very well and concise. You can select tables, columns and event add row based filter to your comparison project.

http://www.red-gate.com/products/sql-development/sql-data-compare/


You can use a database compare tool like the DbComparer . It can help you resolve most of your queries. Its absolutely free too. This will help you understand the differences between the database schemas of the 2 databases.

To compare the two database version, You can check this link out. You can use the Database comparison Wizard to attain the difference.

A free tool called CompareData is available too.

If you want to pay for it, you can check out SQL Data Compare 9.0


It sounds possible. One idea - you could create a "mapping integrity" view so to speak, but assuming this migration will result in the "version 1" database remaining static moving forward, you could instead create a mapping integrity table and populate it one time through a stored procedure. This would result in the mapping integrity table being faster to query than a view.


My understanding of what you're asking is that if you have a field in Schema 1 like Customer.Country, that has values in it "United States", "Canada" and "Mexico", you want a list of all columns in Schema 2 that have the values "United States", "Canada" and "Mexico". Sound right?

Based on my understanding of what you're asking, it could largely depend on what kind of mapping has been done and how complicated it is. Here's some examples, based on some schema change projects I've seen:

Unioning: If Schema 1 had Shoe_Customers and Boot_Customers and Schema 2 joined them into one Customers table, then finding the map for Shoe_Customers.Customer_Name would have to look for a subset of values in Customers.Name. Similarly, if Schema 1's Customers.Country had been replaced with a Countries generic look up table in Schema 2, and someone populated that table with a full list of countries, then you'd be looking for a subset to match Customers.Country and Countries.Name (or, are they matches?).

Subsetting: Taking the opposite approach, if Schema 1 had Customers and Schema 2 broke that into Shoe_Customers and Boot_Customers, then you're only going to find a subset of Schema 1 values in any given column of Schema 2. I'm not sure how you'd define a success case here.

Cardinality: If you're trying to track down fairly unique data, like Customer.Name, then you have a pretty high chance of success with an automated approach. Anything with a low level of cardinality is likely to give you more false leads than you'd find worth it. If Gender is M or F, and Discount_Code is A-Z as is Sole_Style and Heel_Style etc..., then tracking this data down through automated data matching is going to be a massive waste of time. This will get worse with numeric data, especially low cardinality data like percentages.

Data type changes: I'm guessing you're asking this because it involves hundreds or thousands of column changes, with large quantities of data. If the idea is to compare all "string data" to all other "string data", and all "numeric data" to all other "numeric data", then this is going to be massive. If the schema change excluded data type changes (e.g. not allowing: Customer.Country was varchar(15) but now Countries.Name is varchar(50)), then you've got a leg up on the task.

These are the things that are coming to mind at the moment. Your situation may make all of these factors irrelevant, or your situation may have these factors as just the tip of the iceberg. Personally, I'm a little skeptical of a fully automated approach for most situations. My suggestion would be to write a stored procedure that takes two table/column names, one from each schema, and will tell you what coverage of Schema 1 values there is in Schema 2, coverage of Schema 2 values in Schema 1, some kind of cardinality measurements, etc.... Combined with a little human influence, you should be able to some proportion of your columns mapped, probably in less time than a full blown generic solution with lots of dead-ends to pursue.

Good luck,

Terry.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜