Regex for extracting second level domain from FQDN?
I can't figure this out. I need to extract the second level domain from a FQDN. For example, all of these need to return "example.com":
- example.com
- foo.example.com
- bar.foo.example.com
- example.com:8080
- foo.example.com:8080
- bar.foo.example.com:8080
Here's what I have so far:
Dim host = Request.Headers("Host")
Dim pattern As String = "(?<hostname>(\w+)).(?<domainname>(\w+.\w+))"
Dim theMatch = Regex.Match(host, pattern)
ViewData("Message") = "Domain is: " + theMatch.Groups("domainname").ToString
It fails for example.com:8080
and bar.foo.example.com:8080
. Any ide开发者_如何学Goas?
I used this Regex successfully to match "example.com" from your list of test cases.
"(?<hostname>(\w+\.)*)(?<domainname>(\w+\.\w+))"
The dot character (".") needs to escaped as "\.". The "." character in a regex pattern matches any character.
Also the regex pattern you provided requires that there be 1 or more word characters followed by a dot before the domainname match (this part "(?(\w+))." of the pattern. Also, I'm assuming that the . character was supposed to be escaped). This fails to make a match for the input "example.com" because there's no word character and dot before the domainname match.
I changed the pattern so that the hostname match would have zero or more matches of "1 or more word characters followed by a dot". This will match "foo" in "foo.example.com" and "foo.bar" in "foo.bar.example.com".
This assumes you've validated the contents of the fqdn elsewhere (e.g.: dashes allowed, no underscores or other non-alphanumeric characters), and is otherwise as liberal as possible.
'(?:(?<hostname>.+)\.)?(?<domainname>[^.]+\.[^.]+?)(?:\:(?<port>[^:]+))?$'
Matches the hostname component if present (including multiple additional levels):
bar.foo.example.com:8000 would match:
- hostname: bar.foo (optional)
- domainname: example.com
- port: 8000 (optional)
I'm not familiar with VB.NET or ASP, but on the subject of regular expressions...
- First off, you'll want to anchor your expression with
^
and$
. - Next,
\w
may match different things depending on implementation, locale, etc., so you may want to be explicit. For example,\w
may not match a hyphen, a valid character in domain names. - You don't seem to be taking into account an optional port number.
I'm sure there's a more RFC-accurate expression out there, but here's a start at something that should work for you.
^([a-z0-9\-]+\.)*([a-z0-9\-]+\.[a-z0-9\-]+)(:[0-9]+)?$
Broken down:
([a-z0-9\-]+\.)*
: Start with zero or more hostnames...
([a-z0-9\-]+\.[a-z0-9\-]+)
: followed by two hostnames...
(:[0-9]+)?
: followed by an optional port declaration.
Note that if you're dealing with a domain like example.ne.jp
, you will only get .ne.jp
. Also, note that the above example expression should be matched case-insensitively.
精彩评论