开发者

Regex for extracting second level domain from FQDN?

I can't figure this out. I need to extract the second level domain from a FQDN. For example, all of these need to return "example.com":

  1. example.com
  2. foo.example.com
  3. bar.foo.example.com
  4. example.com:8080
  5. foo.example.com:8080
  6. bar.foo.example.com:8080

Here's what I have so far:

    Dim host = Request.Headers("Host")
    Dim pattern As String = "(?<hostname>(\w+)).(?<domainname>(\w+.\w+))"
    Dim theMatch = Regex.Match(host, pattern)
    ViewData("Message") = "Domain is: " + theMatch.Groups("domainname").ToString

It fails for example.com:8080 and bar.foo.example.com:8080. Any ide开发者_如何学Goas?


I used this Regex successfully to match "example.com" from your list of test cases.

"(?<hostname>(\w+\.)*)(?<domainname>(\w+\.\w+))"

The dot character (".") needs to escaped as "\.". The "." character in a regex pattern matches any character.

Also the regex pattern you provided requires that there be 1 or more word characters followed by a dot before the domainname match (this part "(?(\w+))." of the pattern. Also, I'm assuming that the . character was supposed to be escaped). This fails to make a match for the input "example.com" because there's no word character and dot before the domainname match.

I changed the pattern so that the hostname match would have zero or more matches of "1 or more word characters followed by a dot". This will match "foo" in "foo.example.com" and "foo.bar" in "foo.bar.example.com".


This assumes you've validated the contents of the fqdn elsewhere (e.g.: dashes allowed, no underscores or other non-alphanumeric characters), and is otherwise as liberal as possible.

'(?:(?<hostname>.+)\.)?(?<domainname>[^.]+\.[^.]+?)(?:\:(?<port>[^:]+))?$'

Matches the hostname component if present (including multiple additional levels):

bar.foo.example.com:8000 would match:

  • hostname: bar.foo (optional)
  • domainname: example.com
  • port: 8000 (optional)


I'm not familiar with VB.NET or ASP, but on the subject of regular expressions...

  • First off, you'll want to anchor your expression with ^ and $.
  • Next, \w may match different things depending on implementation, locale, etc., so you may want to be explicit. For example, \w may not match a hyphen, a valid character in domain names.
  • You don't seem to be taking into account an optional port number.

I'm sure there's a more RFC-accurate expression out there, but here's a start at something that should work for you.

^([a-z0-9\-]+\.)*([a-z0-9\-]+\.[a-z0-9\-]+)(:[0-9]+)?$

Broken down:

([a-z0-9\-]+\.)*: Start with zero or more hostnames...
([a-z0-9\-]+\.[a-z0-9\-]+): followed by two hostnames...
(:[0-9]+)?: followed by an optional port declaration.

Note that if you're dealing with a domain like example.ne.jp, you will only get .ne.jp. Also, note that the above example expression should be matched case-insensitively.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜