开发者

Java: String.replace(regex, string) to remove content from XML

Lets say I have an XML in the form of a string. I wish to remove the content between two tags within the XML Stri开发者_运维技巧ng, say . I have tried:

String newString = oldString.replaceFirst("\\<tagName>.*?\\<//tagName>",
                                                              "Content Removed");

but it does not work. Any pointers as to what am I doing wrong?


OK, apart from the obvious answer (don't parse XML with regex), maybe we can fix this:

String newString = oldString.replaceFirst("(?s)<tagName[^>]*>.*?</tagName>",
                                          "Content Removed");

Explanation:

(?s)             # turn single-line mode on (otherwise '.' won't match '\n')
<tagName         # remove unnecessary (and perhaps erroneous) escapes
[^>]*            # allow optional attributes
>.*?</tagName>   

Are you sure your matching the tag case correctly? Perhaps you also want to add the i flag to the pattern: (?si)


Probably the problem lies here:

<//tagName>

Try changing it to

<\/tagName>


XML is a grammar; regular expressions are not the best tools to work with grammars.

My advice would be working with a real parser to work with the DOM instead of doing matches

For example, if you have:

<xml>
 <items>
  <myItem>
     <tagtoRemove>something1</tagToRemove>
  </myItem>
  <myItem>
     <tagtoRemove>something2</tagToRemove>
  </myItem>
 </items>

A regex could try to match it (due to the greedy mechanism)

<xml>
 <items>
  <myItem>
     matchString
  </myItem>
 </items>

Also, some uses that some DTDs may allow (such as <tagToRemove/> or <tagToRemove attr="value">) make catching tags with regex more difficult.

Unless it is very clear to you that none of the above may occur (nor or in the future) I would go with a parser.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜