开发者

PHP URL Parsing & disecting

  • www.example.com
  • foo.example.com
  • foo.example.co.uk
  • foo.bar.example.com
  • foo.bar.example.co.uk

I've got these URL's here, and want to always end up with 2 variables:

$domain开发者_StackOverflow社区Name = "example"
$domainNameSuffix = ".com" OR ".co.uk"

If I someone could get me from $url being one of the urls, all the way down to $newUrl being close to "example.co.uk", it would be a blessing.

Note that the urls are going to be completely "random", we might end up having "foo.bar.example2.com.au" too, so ... you know... ugh. (asking for the impossible?)

Cheers,


We had a few questions like this before, but I can't find a good one right now either. The crux is, this cannot be done reliably. You would need a long list of special TLDs (like .uk and .au) which have their own .com/.net level.

But as general approach and simple solution you could use:

preg_match('#([\w-]+)\.(\w+(\.(au|uk))?)\.?$#i', $domain, $m);
list(, $domain, $suffix) = $m;


The "domainNameSuffix" is called a top level domain (tld for short), and there is no easy way to extract it.

Every country has it's own tld, and some countries have opted to further subdivide their tld. And since the number of subdomains (my.own.subdomain.example.com) is also variable, there is no easy "one-regexp-fits-all".

As mentioned, you need a list. Fortunately for you there are lists publicly available: http://publicsuffix.org/


You will need to maintain a list of extensions for most accurate results I believe.

$possibleExtensions = array(
    '.com',
    '.co.uk',
    '.com.au'
);

// parse_url() needs a protocol.
$str = 'http://' . $str;

// Use parse_url() to take into account any paths
// or fragments that may end up being there.
$host = parse_url($str, PHP_URL_HOST);

foreach($possibleExtensions as $ext) {

    if (preg_match('/' . preg_quote($ext, '/') . '\Z/', $host)) {
       $domainNameSuffix = $ext;
       // Strip extension     
       $domainName = substr($str, 0, -strlen($ext));
       // Strip off http://           
       $domainName = substr($domainName, 7);
       var_dump($domainName, $domainNameSuffix);
       break;

    }

}

If you never have any paths or extra stuff, you can of course skip the parse_url() and the http:// adding and removal.

It worked for all your tests.


There isn't a builtin function for this.

A quick google search lead me to http://www.wallpaperama.com/forums/php-function-remove-domain-name-get-tld-splitter-split-t5824.html

This leads me to believe you need to maintain a list of valid TLD's to split URLs on.


Alright chaps, here's how I solved it, for now. Implementation of more domain names will be done as well, at some point in the future. Don't know what technique I'll use, yet.

# Setting options, single and dual part domain extentions
$v2_onePart = array(
                "com"
                );
$v2_twoPart = array(
                "co.uk",
                "com.au"
                );

$v2_url         = $_SERVER['SERVER_NAME'];      # "example.com"     OR  "example.com.au"
$v2_bits        = explode(".", $v2_url);        # "example", "com"  OR  "example", "com", "au"
$v2_bits        = array_reverse($v2_bits);      # "com", "example"  OR  "au", "com", "example"      (Reversing to eliminate foo.bar.example.com.au problems.)

switch ($v2_bits) {
    case in_array($v2_bits[1] . "." . $v2_bits[0], $v2_twoPart):
        $v2_class   = $v2_bits[2] . " " . $v2_bits[1] . "_" . $v2_bits[0];  # "example com_au"
        break;
    case in_array($v2_bits[0], $v2_onePart):
        $v2_class   = $v2_bits[1] . " " . $v2_bits[0];  # "example com"
        break;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜