Regular expression for XML

2023-02-23 14:09 问答作者：

I'm attempting to build a regular expression that will match against the contents of 开发者_如何学运维an XML element containing some un-encoded data. Eg:

<myElement><![CDATA[<p>The <a href="http://blah"> draft </p>]]></myElement>

Usually in this circumstance I'd use

[^<]*

to match everything up to the less than sign but this isn't working in this case. I've also tried this unsuccessfully:

[^(</myElement>)]*

I'm using Groovy, i.e. Java.

Please don't do this, but you're probably looking for:

<myElement>(.*?)</myElement>

This won't work if <myElement> (or the closing tag) can appear in the CDATA. It won't work if the XML is malformed. It also won't work with nested <myElement>s. And the list goes on...

The proper solution is to use a real XML parser.

Your [^(</myElement>)]* regex was saying: match any number of characters that are not in the set (, <, /, m, etc., which is clearly not what you intended. You cannot place a group within a character class in order for it to be treated atomically -- the characters will always be treated as a set (with ( and ) being literal characters, too).

if you are doing it on a line by line basis, this will match the inside if your example:

>(.*)</

returns: <![CDATA[<p>The <a href="http://blah"> draft </p>]]>

Probably use it something like this:

subjectString = '<myElement><![CDATA[<p>The <a href="http://blah"> draft </p>]]></myElement>';
Matcher regexMatcher = subjectString =~ ">(.*)</"
if (regexMatcher.find()) {
    String ResultString = regexMatcher.group();
}

继续阅读：regex

Regular expression for XML

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？