Matching URLs in text except those enclosed by square brackets
I'm trying to create a regex so I can identify URLs in text.
Possible (likely) Test cases:
- http://a.url.com
- http://a.url.co.uk
- [http://a.url.com]
- [http://a.url.co.uk]
- [ http://a.url.com]
- [ http://a.url.co.uk]
- [http://a.url.com ]
- [http://a.url.co.uk ]
- [ http://a.url.com ]
- [ http://a.url.co.uk ]
- text here http://a.url.com and here
- text here http://a.url.co.uk and here
- text here [http://a.url.com] and here
- text here [http://a.url.co.uk] and here
- text here [ http://a.url.com] and here
- text here [ http://a.url.co.uk] and here
- text here [http://a.url.com ] and here
- text here [http://a.url.co.uk ] and here
- text here [ http://a.url.com ] and here
- text here [ http://a.url.co.uk ] and here
Only the lines without square brackets should match. And only the URL should be matched, not the whole line. In case it was unclear the bold text in the list above is what I开发者_如何学C would like the regex to match on.
The current regex I've worked out is:
(^|[^\[ ])(https?://\S+)
Only the first 2 lines match, I can't figure out how to make the other lines without the Square brackets match?
I've used groups because I'll be replacing the match with some HTML later. But need to get the regex working properly first.
I've been using this online tool to help me build and test the regex; http://gskinner.com/RegExr/
You can also use negative lookahead assertions to ensure the line does not contain square brackets using the regex:
^(?!.*\[.*\]).*(https?://\S+)
Rubular link
This should work:
(?<=^[^\[\]]*)(https?://\S+)(?=[^\[\]]*$)
With [^\[\]]*
you say that there could be any symbols except square brackets before and after your link.
This uses positive lookahead and lookbehind to check that there is no brackets.
Your modified working regex:
([^\S\]](https?:\/\/[^\]\s]+)[^\S\]]|^(https?:\/\/[^\]\s]+)$)
Rubular
精彩评论