Java : replacing all URLs with anchor tags that aren't already in anchor tags
I'm trying to replace all non-anchor-tag-enclosed URLs within anchor-tag-enclosed URLs for a document. So given the string:
I have two urls for google: <a href="http://www.google.com/">google</a> and http://www.google.com/
I would like to rep开发者_开发百科lace it with this:
I have two urls for google: <a href="http://www.google.com/">google</a> and <a href="http://www.google.com/">http://www.google.com/</a>
Does anyone know a clean way to do this in Java?
This might get you started (it works for the given example):
public class test {
public static void main(String[] args) {
final String test = "I have two urls for google: <a href=\"http://www.google.com/\">google</a> and http://www.google.com/";
System.out.println(test.replaceAll("(?<!\\<a\\ href=\")http:\\/\\/[^ ]*",
"<a href=\"$0\"/>"));
}
}
There are some problems with it:
- It doesn't account for whitespace in "a" tags, except for a single whitespace between the opening "a" and "href"
- It assumes a URL is "http://" followed by a zero or more characters not equal to space (" ")
This will work for simple examples, I'm not sure how you'd write a complete solution.
精彩评论