Regular Expression to check String is valid XHTML or not [duplicate]
Possible Duplicate:
regular expression to check if string is valid XML
I am looking Regular Expression to check String is Valid XHTML or not
example
<h2>Legal HTML Entity References</h2><table align开发者_StackOverflow="center" border="0" ><tr></tr></table>
This sounds like a bad idea: The language of valid XHTML strings is not regular.
Use an HTML parsing library instead. A few examples:
- JTidy
- TagSoup
- HTMLParser
Related question:
- When should I not use regular expressions?
Regex is exactly the wrong tool to use.
HTML is not a regular language and hence cannot be parsed by regular expressions.
See Jeff's post on the subject here: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
Since you've tagged this post Java, you should look at using one of the myriad of HTML parsing libraries available.
Have a look here why parsing HTML using regular expressions won't work reliably: RegEx match open tags except XHTML self-contained tags
XHTML is just another flavor/superset of HTML, so you're better of using a real validator, like JTidy etc.
Try to check it with a parser. Don't do it the Cthulhu Way.
Here you can find a strating point and some examples on how to do it: The Java XML Validation API
精彩评论