开发者

Diff on SQL Server XML Data Type?

I have an automated process that inserts an XML docume开发者_JAVA技巧nt into SQL Server 2008 table, the column is of Type XML. There is a lot of duplicated data, I wonder if anyone can recommend a good way to delete non-distinct values based on the XML column? The table has thousands of rows and each XML document is about 70k.

Each XML document looks the same except for one element value, for example:

Row 1 , Column C:

<?xml version="1.0"?><a><b/><c>2010.09.28T10:10:00</c></a>

Row 2, Column C:

<?xml version="1.0"?><a><b/><c>2010.09.29T10:10:00</c></a>

I want to pretend that the value of is ignored when it comes to the diff. If everything else is equal, then I want to consider the documents to be the same. If any other element is different, then the documents would be considered different.

Thanks for all ideas.


Can you qualify what 'distinct XML' means for you? For example what is the difference between:

  • <a><b/></a>
  • <?xml version="1.0"?><a><b/></a>
  • <a xmlns:xhtml="http://www.w3.org/1999/xhtml"><b/></a>
  • <a><b xsi:nil="true" /></a>
  • <a><b></b></a>
  • <?xml version="1.0" encoding="UTF-8"?><a><b/></a>
  • <?xml version="1.0" encoding="UTF-16"?><a><b></b></a>

In your opinion, how many 'distinct' XMLs are there?

Updated

If your XML looks like: <?xml version="1.0"?><a><b/><c>2010.09.29T10:10:00</c></a> then you can project the element that distinguish the fields and query on this projection:

with cte_x as (
  select xmlcolumn.value(N'(//a/c)[1]', N'DATETIME') as xml_date_a_c,
   ...
  from table
),
cte_rank as (
  select row_number() over (partition by xml_date_a_c order by ...) as rn
  from cte_x)
delete from cte_rank
  where rn > 1;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜