REGEX: adding links in an HTML text
I have a puzzle that requires your help : I need to replace certain words with links in an HTML Text.
For example, I have to replace "word" with "<a href="...">word</ a>"
The difficulty is double :
- 1. not to add links in tag attributes
- 2. not to add links other links (nested links).
I found a solution to meet the case (1) but I can not handle the case (2).
Here is my simplified code:
String text="sample text <a>sample text</a> sample <a href='http://www.sample.com'>a good sample</a>";
String wordToReplace="sample";
String pattern="\\b"+wordToReplace+"\\b(?![^<>]*+>)"; //the last part is here to solve de problem (1)
String link="["+wordToReplace+"]"; //for more clarity, the generated link is replaced by [...]
System.out.println(text.replaceAll(pattern,link));
开发者_开发百科
The result is:
[sample] text <a>[sample] text</a> [sample] <a href='http://www.sample.com'>a good [sample]</a>
Problem : there is a link in a another link.
Do you have an idea how to solve this problem ?
Thank you in advance
Parsing HTML with regex is always a bad idea, precisely because of odd cases such as this. It would be better to use an HTML parser. Java has a built-in HTML Parser with using Swing that you might want to look into.
精彩评论