Java: String.replace(regex, string) to remove content from XML
Lets say I have an XML in the form of a string. I wish to remove the content between two tags within the XML Stri开发者_运维技巧ng, say . I have tried:
String newString = oldString.replaceFirst("\\<tagName>.*?\\<//tagName>",
"Content Removed");
but it does not work. Any pointers as to what am I doing wrong?
OK, apart from the obvious answer (don't parse XML with regex), maybe we can fix this:
String newString = oldString.replaceFirst("(?s)<tagName[^>]*>.*?</tagName>",
"Content Removed");
Explanation:
(?s) # turn single-line mode on (otherwise '.' won't match '\n')
<tagName # remove unnecessary (and perhaps erroneous) escapes
[^>]* # allow optional attributes
>.*?</tagName>
Are you sure your matching the tag case correctly? Perhaps you also want to add the i
flag to the pattern: (?si)
Probably the problem lies here:
<//tagName>
Try changing it to
<\/tagName>
XML is a grammar; regular expressions are not the best tools to work with grammars.
My advice would be working with a real parser to work with the DOM instead of doing matches
For example, if you have:
<xml>
<items>
<myItem>
<tagtoRemove>something1</tagToRemove>
</myItem>
<myItem>
<tagtoRemove>something2</tagToRemove>
</myItem>
</items>
A regex could try to match it (due to the greedy mechanism)
<xml>
<items>
<myItem>
matchString
</myItem>
</items>
Also, some uses that some DTDs may allow (such as <tagToRemove/>
or <tagToRemove attr="value">
) make catching tags with regex more difficult.
Unless it is very clear to you that none of the above may occur (nor or in the future) I would go with a parser.
精彩评论