开发者

Regex to check for XML if it is well formed

Is there a regex for checking if the xml is well formed ?

Thanks

Edit: If not regex, then is there a good parsing method that i can use in c# that doesnt throw 开发者_Go百科exception. I tried using xmlReader but it didnt work for me.


This is well beyond the capabilities of regular expressions. In other words, the answer is that it's not possible.

EDIT: There are plenty of tools available to check well-formedness, but they all involve some sort of XML parser/validator. If you provide more information about your environment maybe we can point you in the right direction.


No.

XML syntax is irregular enough to give any regular expression nightmares.

You're not the first to ask this, but don't feel bad because the question about parsing HTML and XML with regular expressions will keep being asked because regular expressions look perfect for the job but they aren't sadly.

XML syntax is complex enough that you can't safely parse it with a regex. It looks simple and regular but there's plenty of scope for causing problems. One nasty CDATA section and things get very hard. And consider the RSS feeds where you get HTML embedded in the XML.

So please use an XML parsing library for this. There are plenty of them.

If you want more detail have a look at this question which gives some examples of the horror syntax you can meet and this question which shows what happens if do try to parse these things with Regular Expressions.


If not regex, then is there a good parsing method that i can use in c# that doesnt throw exception. I tried using xmlReader but it didnt work for me.

Using XmlReader and while(reader.Read()) {} (catching any exception) is probably the fastest pure managed approach.


There is no regex solution, because Jeff told me so.


No, there is not. (Practically speaking and for the general case, at least.) Use a validating parser if you want to determine whether or not XML is well-formed.


Use a XML validator instead.


No, if recursive regexps are not considered. Regexps can't check arbitratry nesting. However, some regexp engines accept recursive regexps which you may try using for this purpose.


recent versions of PCRE have all kinds of features which would make this achievable, but the code would be ugly as hell. libxml2 comes with xmllint, why not use the right tool for the job?


I'm making an assumption here. You think that using a library will be too slow or too heavyweight to do this quickly and/or efficiently.

If this is the case then test it out. Try a few libraries, see how big they are, see how fast they are.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜