issues with my regex to detect urls in a string?
Greetings all. I am using the following regex to detect urls in a string and wrap them inside the < a > tag
public static String detectUrls(String text) {
String newText 开发者_如何学Python= text
.replaceAll("(?:https?|ftps?|http?)://[\\w/%.-?&=]+",
"<a href='$0'>$0</a>").replaceAll(
"(www\\.)[\\w/%.-?&=]+", "<a href='http://$0'>$0</a>");
return newText;
}
i have a problem that the following links are not detected correctly: i am not that good with regex, so please advise.
http://code.google.com/p/shindig-dnd/
http://confluence.atlassian.com/display/GADGETDEV/Gadgets+and+JIRA+Portlets
www.liferay.com/web/raymond.auge/blog/
(www.opensocial.org/)
http://www.google.com
I'm using this:
private static final String URL_REGEX =
"http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";
Matcher matcher = URL_PATTERN.matcher(text);
text = matcher.replaceAll("<a href=\"$0\">$0</a>");
return text;
The problem you have is that you are using -
within a character group ([]
) without escaping it, which is being used to define the range .-?
(i.e. the characters ./0123456789:;<=>?
). Either escape it \\-
or put it at the end of the character class so that it doesn't complete a range.
public static String detectUrls(String text) {
String newText = text
.replaceAll("(?:https?|ftps?|http?)://[\\w/%.\\-?&=]+",
"<a href='$0'>$0</a>").replaceAll(
"(www\\.)[\\w/%.\\-?&=]+", "<a href='http://$0'>$0</a>");
return newText;
}
As marcog said, you should escape the -
and to match the last 2 examples you gave, you have to make the http
optionnal. Also http?
matches htt
wich is not a correct protocol.
So the regex will be:
"(?:(?:https?|ftps?)://)?[\\w/%.?&=-]+"
精彩评论