开发者

PHP: How to get base URL from HTML page

I'm struggling with figuring out how to do this. I have an absolute URL to an HTML page, and I need to get the base URL for this. So the URLs could be for example:

  • http://www.example.com/
  • https://www.example.com/foo/
  • http://www.example.com/foo/bar.html
  • https://alice@www.example.com/foo

And so on. So, first problem is to find the base URL from those and other URLs. The second problem is that some HTML pages contain a base tag, which could be for example http://example.com/ or simply / (although I think some browser only support the one starting with protocol://?).

Either way, how can I do this in PHP corrrectly? I have the URL and I have the HTML loaded up in a DOMDocument so should be able to grab the base tag fairly easily if it exists. How do browsers solve this for example?


Clarification on why I need this

I'm trying to create something which takes a URL to a web page and returns the absolute URL to all the images this web page links to. Since some/many/all of these images might have relative URLs, I need to find the base URL to use when I make them absolute. This might be the base URL of the web page, or it might be a base URL specified in the HTML itself.

I have manage开发者_开发百科d to fetch the HTML and find the URLs. I think I've also found a working method of making the URLs absolute when I have the base URL to use. But finding the base URL is what I'm missing, and what I'm asking about here.


See parse_url().

$result=parse_url('http://www.google.com');
print_r($result);

Pick out of there whichever element you are looking for. You probably want $result['path'].


Fun with snippets!

if (!function_exists('base_url')) {
    function base_url($atRoot=FALSE, $atCore=FALSE, $parse=FALSE){
        if (isset($_SERVER['HTTP_HOST'])) {
            $http = isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) !== 'off' ? 'https' : 'http';
            $hostname = $_SERVER['HTTP_HOST'];
            $dir =  str_replace(basename($_SERVER['SCRIPT_NAME']), '', $_SERVER['SCRIPT_NAME']);

            $core = preg_split('@/@', str_replace($_SERVER['DOCUMENT_ROOT'], '', realpath(dirname(__FILE__))), NULL, PREG_SPLIT_NO_EMPTY);
            $core = $core[0];

            $tmplt = $atRoot ? ($atCore ? "%s://%s/%s/" : "%s://%s/") : ($atCore ? "%s://%s/%s/" : "%s://%s%s");
            $end = $atRoot ? ($atCore ? $core : $hostname) : ($atCore ? $core : $dir);
            $base_url = sprintf( $tmplt, $http, $hostname, $end );
        }
        else $base_url = 'http://localhost/';

        if ($parse) {
            $base_url = parse_url($base_url);
            if (isset($base_url['path'])) if ($base_url['path'] == '/') $base_url['path'] = '';
        }

        return $base_url;
    }
}

Use as simple as:

//  url like: http://stackoverflow.com/questions/2820723/how-to-get-base-url-with-php

echo base_url();    //  will produce something like: http://stackoverflow.com/questions/2820723/
echo base_url(TRUE);    //  will produce something like: http://stackoverflow.com/
echo base_url(TRUE, TRUE); || echo base_url(NULL, TRUE);    //  will produce something like: http://stackoverflow.com/questions/
//  and finally
echo base_url(NULL, NULL, TRUE);
//  will produce something like: 
//      array(3) {
//          ["scheme"]=>
//          string(4) "http"
//          ["host"]=>
//          string(12) "stackoverflow.com"
//          ["path"]=>
//          string(35) "/questions/2820723/"
//      }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜