REGEX URL regular expression [duplicate]

2023-01-06 06:15 问答作者：

This question already has answers here: Closed 11 years ago.

Possible Duplicate:
Regular expression for browser Url

Is this regex perfect for any url ?

preg_match_all(
 '/([www]+(\.|dot))?[a-zA-Z0-9_\.-]+(\.|dot){1,}[com|net|org|info\.]+((\.|dot){0,}[a-zA-Z]){0,}+/i', 
 $url, $regp);

Don't use regex for that. If you cant resist, a valid one can be found here: What is the best regular expression to check if a string is a valid URL? but that regex is ridiculous. Try to use your framework for that, if you can (Uri class in .net for example).

No. In fact it doesn't match URLs at all. It's trying to detect hostnames written in text, like www.example.com.

Its approach is to try to detect some common known TLDs, but:

[com|net|org|info\.]+

is actually a character group, allowing any sequence of characters from the list |.comnetrgif. Probably this was meant:

((com|net|org|info)\.)+

and also [www] is similarly wrong, plus the business with dot doesn't really make any sense.

But this is in general a really bad idea. There are way more TLDs in common use than just those and the 2-letter CCTLDs. Also many/most of the CCTLDs don't have a second-level domain of com/net/org/info. This expression will fail to match those, and will match a bunch of other stuff that's not supposed to be a hostname.

In fact the task of detecting hostnames is basically impossible to do, since a single word can be a hostname, as can any dot-separated sequence of words. (And since internationalised domain names were introduced, almost anything can be a hostname, eg. 例え.テスト.)

'any' url is a tough call. In OZ you have .com.au, in the UK it is .co.uk Each country has its own set of rules, and they can change. .xxx has just been approved. And non-ascii characters have been approved now, but I suspect you don't need that.

I would wonder why you want validation which is that tight? Many urls that are right will be excluded, and it does not exlude all incorrect urls. www.thisisnotavalidurl.com would still be accepted.

I would suggest A) using a looser check , just for ([a-zA-Z0-9_.-].)*[a-zA-Z0-9_.-] (or somthing), just as a sanity check B) using a reverse lookup to check if the URL is actually valid if you want to only allow actual real urls.

Oh, and I find this: http://www.fileformat.info/tool/regex.htm to be a really useful tool if I am developing regex, which I am not great at.

[www]+ should be changed for (www)?

(\.|dot){1,} - one and more? mayby you wanted to do ([a-zA-Z0-9_\.-]+(\.|dot)){1,}

A URL also has a protocol like http, which you're missing. You're also missing a lot of TLDs, as already mentioned.

Something like an escaped space (%20) would also not be recognized.

Port numbers can also appear in an URL (e.g. :80)

No, and you can't create a REGEX that will parse any URI (or URL or URN) - the only way to parse them properly is to read them as per the spec of RFC-3986

继续阅读：php regex

REGEX URL regular expression [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？