How to get the absolute image URIs with SimpleHTMLDom [duplicate]
I use simple_html_dom
to get site's images. But sometimes, the image's link are not prefixed with the full domain URI, e.g. with http://example.com
. They appear as something like
- images/_home-ss-21.jpg
- /_home-ss-22b.jpg
- ./_1249a7s.png or
- ../../../a19489s_20110412.jpeg.
How to can I convert these URIs to absolute URIs including the protocol and domain information.
<?php
header('Content-type:text/html; charset=utf-8');
require_once 'simple_html_dom.php';
$v = 'http://www.typepad.com/';
$html = file_get_html($v);
foreach($html->find('img') as $element) {
echo $element->src.'<hr />';
}
?>
Inside your foreach
you can try the following to build the URL to the images.
$img_src = $element->src;
if(!strstr($img_src, 'http://')) {
$img_src = $v . $img_src;
}
echo $img_src . '<hr /';
There are some scripts out there that can do this work as well to convert relative URLs to absolute URLs:
- http://nashruddin.com/PHP_Script_for_Converting_Relative_to_Absolute_URL
- http://nadeausoftware.com/node/79
- http://publicmind.in/blog/urltoabsolute/
- http://www.web-max.ca/PHP/misc_24.php
I have never tried them, but they should help you to work past this.
3 options:
- The image on the other site starts with http:// > use direct link
- Image starts with / > use home url of other site + image
- Image doesn't start with / > use full url + path to director of the site you are checking and add the image
./ is current directory so if you are at http://example.com and you see an image with src attribute ./hoopy_frood.png it means the whole address is http://example.com/hoopy_frood.png
../ means one directory up, so for example at http://example.com/ice_cream/sundae.html if you see an image with src attribute ../images/hoopier_is_not_a_word.gif then the image hoopier_is_not_a_word.gif is in a directory called images which is inside the site root directory along with the directory called ice_cream.
精彩评论