How to do regex HTML tag replace in SQL Server?
I have a table in SQL Server 2005 with hundreds of rows with HTML content. Some of the content has HTML like:
<span class=heading-2>Directions</span>开发者_如何学编程
where "Directions" changes depending on page name.
I need to change all the <span class=heading-2>
and </span>
tags to <h2>
and </h2>
tags.
I wrote this query to do content changes in the past, but it doesn't work for my current problem because of the ending HTML tag:
Update ContentManager
Set ContentManager.Content = replace(Cast(ContentManager.Content AS NVARCHAR(Max)), 'old text', 'new text')
Does anyone know how I could accomplish the span to h2 replacing purely in T-SQL? Everything I found showed I would have to do CLR integration. Thanks!
Indeed T-SQL does not natively support regular expressions and this is the sort of problem in which regular expressions would be the tool of choice. First, I'll say that the level of complication in the solution depends greatly on how consistent your data is. For example, suppose we search for items with the heading:
Select ..
From ...
Where HtmlContent Like '<span class="heading-2">%'
This assumes no additional spacing between span
and class
as well as no additional spacing after the final double quote before the end bracket. We could write '%<span%class="heading-2"%>%'
to account for the spaces but that would also find div
tags marked as heading-2
in the same content as any span tag. If this later scenario shouldn't happen but you might have varying spaces, then use this revised pattern. Where we will really run into troubles is the closing tag. Suppose our content looks like so:
<span class="heading-2"> Foo <span class="heading-3">Bar</span> And Gamma Too</span> .... <span class="heading-4">Fubar Is the right way!</span>...
It is not so simple to find the correct closing span
tag to change to </h2>
. You cannot simply find the first </span>
and change it to </h2>
. If you knew that you had no nested span
tags, then you could write a user-defined function that would do it:
Create Function ReplaceSpanToH2( @HtmlContent nvarchar(max) )
Returns nvarchar(max)
As
Begin
Declare @StartPos int
Declare @EndBracket int
Set @StartPos = CharIndex('<span class="heading-2">', @HtmlContent)
If @StartPos = 0
Return @HtmlContent
Set @HtmlContent = Replace(@HtmlContent, '<span class="heading-2">', '<h2>')
-- find next </span>
Set @StartPos = CharIndex('</span>', @HtmlContent, @StartPos)
Set @HtmlContent = Stuff(@HtmlContent, @StartPos, 7, '</h2>')
Return @HtmlContent
End
If you are positive that all of the HTML is (and will continue to be) valid XHTML and you're using SQL Server 2005 or later, you might be able to cast the columns to an XML data type and use XQuery. See http://msdn.microsoft.com/en-us/library/ms345117%28SQL.90%29.aspx
(Caveat: I haven't tried this.)
I think the best answer, though, is Michael Petito's comment. I would write an application to do this and use the Html Agility Pack. That will provide a permanent, maintainable solution that will work in nearly all cases.
(If this is a one-shot and you don't care about accuracy, then pick your poison.)
I am not real strong in SQL Server but here is how I would try to do this:
UPDATE TableName SET FieldName = REPLACE(FieldName ,'<span class=heading-2>', '<h2>') SET FieldName = REPLACE(FieldName, '</span>', '</h2>')
There might need to be 2 UPDATE statements issued, I am not sure if you can operate on the same field this way. The OP did say ALL occurences of the text. Set me straight if I am missing something.
Of course if there are other <span class=heading-2>
or </span>
text that you don't want to change, this won't work.
Gah, use jquery! Don't make life difficult.. there's some sample replace code on the jquery homepage and you can just include the jquery-1.4.2.js in the <head>
section
精彩评论