开发者

issues with my regex to detect urls in a string?

Greetings all. I am using the following regex to detect urls in a string and wrap them inside the < a > tag

public static String detectUrls(String text) {

        String newText 开发者_如何学Python= text
                .replaceAll("(?:https?|ftps?|http?)://[\\w/%.-?&=]+",
                        "<a href='$0'>$0</a>").replaceAll(
                        "(www\\.)[\\w/%.-?&=]+", "<a href='http://$0'>$0</a>");
        return newText;
    }

i have a problem that the following links are not detected correctly: i am not that good with regex, so please advise.

http://code.google.com/p/shindig-dnd/

http://confluence.atlassian.com/display/GADGETDEV/Gadgets+and+JIRA+Portlets

www.liferay.com/web/raymond.auge/blog/

(www.opensocial.org/)

http://www.google.com


I'm using this:

private static final String URL_REGEX = 
   "http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";

Matcher matcher = URL_PATTERN.matcher(text);
text = matcher.replaceAll("<a href=\"$0\">$0</a>");
return text;


The problem you have is that you are using - within a character group ([]) without escaping it, which is being used to define the range .-? (i.e. the characters ./0123456789:;<=>?). Either escape it \\- or put it at the end of the character class so that it doesn't complete a range.

public static String detectUrls(String text) {
    String newText = text
            .replaceAll("(?:https?|ftps?|http?)://[\\w/%.\\-?&=]+",
                    "<a href='$0'>$0</a>").replaceAll(
                    "(www\\.)[\\w/%.\\-?&=]+", "<a href='http://$0'>$0</a>");
    return newText;
}


As marcog said, you should escape the - and to match the last 2 examples you gave, you have to make the http optionnal. Also http? matches htt wich is not a correct protocol.

So the regex will be:

"(?:(?:https?|ftps?)://)?[\\w/%.?&=-]+"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜