Regex matching url authority parts
I need to match these parts of the following string:
(user)@(hostname):(port)
User and port can optionally be matched. First I managed it with this regular expression:
(?:([^@]*)@)?([^\:]+)(?:\:(\d+))?
This matches for foo@bar:80
foo
bar
80
But when it comes to a IPv6 host like foo@[2001:0db8:85a3:08d3:开发者_运维知识库1319:8a2e:0370:7344]:80
, the preceding regex won't work as expected:
foo
[2001
0
So now I'm pondering about a regular expression which can also match square bracket enclosed hosts with colons, but without square brackets. :) I've done that with the following regex:
(?:([^@]*)@)(?:\[(.+)\]|([^:]+))(?:\:(\d+))?
foo
2001:0db8:85a3:08d3:1319:8a2e:0370:7344
<empty>
80
But.. this is ugly, because either 2
or 3
will be empty.
Is there any way to combine this to only one backreference?
I'm using boost::regex, which uses perl's regex engine as far as I know.
Thanks and regards
reeaal
(?:([^@]*)@)(\[.+\]|([^:]+))(?:\:(\d+))?
But you'll have to strip off the []
if it's an IPv6 addr. Should be fairly trivial though.
You could also do it with optional [
and ]
before and after, and then lookaround assertions... but that's REALLY ugly; your fellow programmers will thank you if you just KISS and use the above, but here's the option:
(?:([^@]*)@)\[?((?<=\[).+(?=\])|([^:]+))\]?(?:\:(\d+))?
精彩评论