开发者

RegEx for HTML replace

Hi I am tryi开发者_开发问答ng to find RegEx which helps me to replace words in HTML. Problem occurs if the word i am trying to replace is in HTML tag as well.

Example:<img class="TEST">asd TEST asd dsa asd </img>

and i need to get the second "TEST" only.

RegEx i am looking for should look like >[^<]*TEST, but this regex takes chars before the word TEST as well. Is it possible to select only word TEST ? but imagine other combinations as well (i dont think " TEST " is a good solution as soon as text could contain another chars as well)


First of all, regex is not good option for html parsing.. There are lots of enhanced html parsers that you can use..

But if you insist to use regex , here is the regex ;

(?<=>.*)TEST(?=.*<)

for java,

(?<=>.{0,100000})TEST(?=.{0,100000}<)

for more information why we can not use * or + with lookbehind regex in Java , Regex look-behind without obvious maximum length in Java


First of all, like has been said and will be said again, using regex for XML is usually a bad idea. But for really simple cases it can work, especially if you can live with sub-optimal results.

So, just put the test in a group and replace only the group

Something like

Pattern replacePattern = Pattern.compile(">[^<]*(TEST)");
Matcher matcher = replacePattern.matcher(theString);
String result = theString.substr(1,matcher.start(1)) + replacement + theString.substr(matcher.end(1));

Disclaimer: Not tested, might have some off-by-ones. But the concept should be clear.


How about if "TEST" is inside another tag than , like say inside the body tag, or for that matter inside the html tag?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜