开发者

Regexing URLs with and without a protocol in PHP

So I've got this URL regex:

/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-开发者_StackOverflow社区_.,~]*[a-z0-9_&=#/])?))/iux

What it's currently matching:

  • http://www.google.com
  • http://google.com

I need it to also match:

  • www.google.com
  • google.com

I tried making the protocol part of the regex optional by slapping a ? at the end "(https?:\/\/)?" but that didn't do anything.

Ideas?


I'd look for something in the language that you are using to do this. URLs are tough to match with a regex. If you insist, I changed yours to make the (https?://) optional. I did not check it though.

/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)?((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux

I got this example from the RFC 3986 and was directed there by this comment. Although, I'd still recommend using something from whatever language you are using rather than a regex.

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Since you are using PHP, did you consider using parse_url? It looks like it will return false on bad urls.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜