Parse/Shred Huge Complex XML to SQL Server 2008 Database (30+ tables)
I read this already: The Best Way to shred XML data into SQL Server database columns and Looking for a good Bulk Insert XML Shredding example for SQL 2005.
The differences of why I'm posting is that I'm using BizTalk 2009 and SQL 2008.
I'm receiving a huge XML structure from a vendor using BizTalk. The client has normalized the XML structure into about 30 tables on a MS/SQL Server 2008 database.
Is there any magic solution yet?
Seems like to me these are the options:
1) BizTalk SQL adapter only good for simple flat databases (not a lot of joins and one-to-many relationships).
2) Write a WCF program a) use LINQ and expose the LINQ object b) use traditional XML DOM or SAX parsing and build ADO.NET to store in database
3) Write a complex Stored Proc that uses Open/XML.
4) Store the database temporarily in an SQL/XML Column, then use some other tool to "shred and normalize" the data. Is there anything in SSIS that would do this?
5) Leave the data in an XML column, and use XML indices and never normalize it. Embed the ugly XQuery/Xpath statements in a view. Not sure if response time or queries would be adequate. Might take as long to generate the xqueries and views as it would to do one of the other steps above.
I'm guessing that #2 or #3 would take at least one or two hours per table, thus if we have 30 tables, at least 30 (if not 60 hours) of various tedious boring and error-prone work.
Thanks,
Neal Walters
Update 12/23: Some sample data:
<ns0:ValAgg xmlns:va="http://msbinfo.com/expresslync/rct/valuation" xmlns:ns0="http://TFBIC.RCT.BizTalk.Orchestrations.ValAgg">
- <MainStreetValuation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://msbinfo.com/expresslync/rct/valuation">
<ValuationIdentifier>
<RecordId>1928876</RecordId>
<PolicyNumber>ESTIMATE-1928876</PolicyNumber>
<VersionId>6773220</VersionId>
</ValuationIdentifier>
<RecordType>EST</RecordType>
<PolicyStatus>Complete</PolicyStatus>
<DataSource>WEB</DataSource>
<bunch more here/>
<valuationAggregateFlat xmlns="http://tempuri.org/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<policyNumber>ESTIMATE-1928876</policyNumber>
<recordId>1928876</recordId>
<versionId>6773220</versionId>
<updateTimeStamp>2009-12-14T14:50:30.743</updateTimeStamp>
<replacementCost>166129</replacementCost>
&l开发者_运维知识库t;yearBuilt>1999</yearBuilt>
<totalLivingAreaSqFt>2000</totalLivingAreaSqFt>
<primaryRCTRoofTypeCode>15012</primaryRCTRoofTypeCode>
<TOPSRoofType>COPR</TOPSRoofType>
<StdFireRoofType>COPR</StdFireRoofType>
<primaryRTCConstructionTypeCode>10016</primaryRTCConstructionTypeCode>
<constructionType>BV</constructionType>
<hailProofIndicator>false</hailProofIndicator>
<anyWoodRoofIndicator>false</anyWoodRoofIndicator>
<allMetalRoofIndicator>true</allMetalRoofIndicator>
</valuationAggregateFlat>
</ns0:ValAgg>
Where you see "MainStreetValuation" could also be a couple of other complex types, such as "HighValueValuation" where the entire structure is different for homes that have fancy stuff.
Quick note: the fact that you're using BizTalk 2009 does not, by itself, prevent you from also using something like SSIS for shredding and otherwise processing the XML.
The following is too long for a comment:
There's an issue to be aware of with the XML Source. Consider an XML structure like:
<root>
<parent attr1="value1" attr2="value2">
<child attrc1="valuec1" attrc2="valuec2"/>
<child attrc1="valuec1" attrc2="valuec2"/>
</parent>
<parent> ... </parent>
...
</root>
The result of processing this through the XML Source will be two outputs: one with attr1 and attr2, and another with attrc1 and attrc2. The outputs are all processed asynchronously with respect to each other. You'll need to corollate the parent and child rows by means of an artificial column that SSIS will introduce. Each parent will have an id column, and each child will have the same id value as a "foreign key". You may need to do a little work in your database to match the two.
精彩评论