开发者

Regular Expression to check String is valid XHTML or not [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

regular expression to check if string is valid XML

I am looking Regular Expression to check String is Valid XHTML or not

example

<h2>Legal HTML Entity References</h2><table align开发者_StackOverflow="center" border="0" ><tr></tr></table>


This sounds like a bad idea: The language of valid XHTML strings is not regular.

Use an HTML parsing library instead. A few examples:

  • JTidy
  • TagSoup
  • HTMLParser

Related question:

  • When should I not use regular expressions?


Regex is exactly the wrong tool to use.

HTML is not a regular language and hence cannot be parsed by regular expressions.

See Jeff's post on the subject here: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

Since you've tagged this post Java, you should look at using one of the myriad of HTML parsing libraries available.


Have a look here why parsing HTML using regular expressions won't work reliably: RegEx match open tags except XHTML self-contained tags

XHTML is just another flavor/superset of HTML, so you're better of using a real validator, like JTidy etc.


Try to check it with a parser. Don't do it the Cthulhu Way.

Here you can find a strating point and some examples on how to do it: The Java XML Validation API

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜