开发者

How to catch URLs given by user in text

I would like to get URLs given by user in his/her text (I assume that URL must be started with http://) . This is first attempt:

Pattern pattern = Pattern.compile("http://[^ ]+");

but if user types something like this:

"look at s开发者_StackOverflowomepage (http://somepage.net)"
"look at http://somepage1.net, http://somepage2.net and sth else"
"Please visit our page http://somepage.net."

the URL was with incorrect(?) character at the end. How to avoid this?


Can math, what URL can't end by [,.)] etc, end only [A-Za-z] or / , but this broke url's whith specific end such as http://site.com/read.php?key=F#$.)


The answer is that you cannot do this with 100% accuracy.

A URL like "http://somepage1.net," is technically legal, and there is no way of knowing for sure whether the "," is part of the URL or just punctuation.

A URL like "http://somepage1.net or something" is technically illegal, but typical end users don't know this. (They are used to browsers that do all sorts of funky things to what they type at their browser.)

Probably, best you can do is use a regex to extract legal URLs, and then trim text punctuation characters from the right end of the URL ... on the assumption that they are not intended to be part of the URL.

You could also treat matching quotes or left / right brackets as denoting URL boundaries; e.g.

    The secret URL is "http://example.com/?" ... don't leave off the "?"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜