开发者

How can I do basic html validation within a column in a SQL Server database?

I have a database table with HTML sn开发者_如何学Goippets (not whole documents) in a column and I need to do some basic HTML validation of the contents. My initial need is to just be able to run a one time query+validation report, not anything more complicated than that.


I would suggest using Regex -

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Example -

select dbo.RegexMatch( N'123-45-6789', N'^\d{3}-\d{2}-\d{4}$' )

Or stricly t-sql -

http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

However, CLR User-Defined Functions are probably the way to go.


SQL Server does have some XML validation capabilities built in for a field of type XML. Given that HTML is a subset of XML you might be able to twist that functionality to make SQL Server do the work for you.


I read Jeff's post here http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html and realized that I need to use a real parser after all.

It looks like http://tidy.sourceforge.net/ will get me what I need, I'll just have to write an ugly script that goes row by row and shells out to Tidy.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜