Regular expression for XML
I'm attempting to build a regular expression that will match against the contents of 开发者_如何学运维an XML element containing some un-encoded data. Eg:
<myElement><![CDATA[<p>The <a href="http://blah"> draft </p>]]></myElement>
Usually in this circumstance I'd use
[^<]*
to match everything up to the less than sign but this isn't working in this case. I've also tried this unsuccessfully:
[^(</myElement>)]*
I'm using Groovy, i.e. Java.
Please don't do this, but you're probably looking for:
<myElement>(.*?)</myElement>
This won't work if <myElement>
(or the closing tag) can appear in the CDATA. It won't work if the XML is malformed. It also won't work with nested <myElement>
s. And the list goes on...
The proper solution is to use a real XML parser.
Your [^(</myElement>)]*
regex was saying: match any number of characters that are not in the set (
, <
, /
, m
, etc., which is clearly not what you intended. You cannot place a group within a character class in order for it to be treated atomically -- the characters will always be treated as a set (with (
and )
being literal characters, too).
if you are doing it on a line by line basis, this will match the inside if your example:
>(.*)</
returns: <![CDATA[<p>The <a href="http://blah"> draft </p>]]>
Probably use it something like this:
subjectString = '<myElement><![CDATA[<p>The <a href="http://blah"> draft </p>]]></myElement>';
Matcher regexMatcher = subjectString =~ ">(.*)</"
if (regexMatcher.find()) {
String ResultString = regexMatcher.group();
}
精彩评论