开发者

Simple java regular expression replace question

I have a simple xml file and I want to remove everything before the first <item> tag.

<sometag>
  <something>
   .....
  </something>
  <item>item1
  </item>
  ....
</sometag>

The following java code is not working:

Str开发者_如何学Going cleanxml = rawxml.replace("^[\\s\\S]+<item>", "");

What is the correct way to do this? And how do I address the non-greedy issue? Sorry I'm a C# programmer.


Well, if you want to use regex, then you can use replaceAll. This solution uses a reluctant quantifier and a backreference:

String cleanxml = rawxml.replaceAll(".*?(<item>.*)", "$1");

Alternately you can use replaceFirst. This solution uses a positive lookahead.

String cleanxml = rawxml.replaceFirst(".*?(?=<item>)", "");

It makes more sense to just use indexOf and substring, though.

String cleanxml = rawxml.substring(rawxml.indexOf("<item>"));

The reason why replace doesn't work is that neither char nor CharSequence overloads is regex-based. It's simple character (sequence) replacement.


Also, as others are warning you, unless you're doing processing of simple XMLs, you shouldn't use regex. You should use an actual XML parser instead.


... What is the correct way to do this? ...

Since you asked about the correct way the correct way to do this is to parse the XML and remove the nodes and re-serialize to a String. You should never use regular expressions for manipulating XML or any other structured document that has parsers available ( JSON, YAML, etc).
For small XML I would suggest JDOM.


use

replaceAll

or

replaceFirst

just replace will look for string matches HTH

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜