Don't get the url contains: "togl" [Regex]
I have a great URL catching Regex but I have a problem.. I don't want to catch url's from are togl.me... My Regexp is:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))
And this is the regex pattern:
(?xi)
\b
( # Capture 1: ent开发者_JAVA技巧ire matched URL
(?:
https?:// # http or https protocol
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
Don't catch URLs from http://togl.me . I can check the domain name with parse_url after catching the URLs but why need it?
After matching the domain, you can look back to check that it was not togl.me
.
[a-z0-9.\-]+[.][a-z]{2,4}(?<!/togl\.me)/
Edit: since the domain can be matched in other places than where the comments say so, lets move the check for togl.me
.
…
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?<!togl\.me/)
(?!togl\.me)
(?: # One or more:
[^\s()<>]+
…
More help: http://www.regular-expressions.info/lookaround.html
精彩评论