开发者

How to get the absolute image URIs with SimpleHTMLDom [duplicate]

This question already has answers here: How to extract complete sub links us开发者_如何转开发ing Simple-HTML-DOM? (3 answers) Closed 9 years ago.

I use simple_html_dom to get site's images. But sometimes, the image's link are not prefixed with the full domain URI, e.g. with http://example.com. They appear as something like

  • images/_home-ss-21.jpg
  • /_home-ss-22b.jpg
  • ./_1249a7s.png or
  • ../../../a19489s_20110412.jpeg.

How to can I convert these URIs to absolute URIs including the protocol and domain information.

<?php
header('Content-type:text/html; charset=utf-8');
require_once 'simple_html_dom.php';
$v = 'http://www.typepad.com/';
$html = file_get_html($v);
foreach($html->find('img') as $element) {
    echo $element->src.'<hr />';   
}
?>


Inside your foreach you can try the following to build the URL to the images.

$img_src = $element->src;
if(!strstr($img_src, 'http://')) {
    $img_src = $v . $img_src;
}
echo $img_src . '<hr /';

There are some scripts out there that can do this work as well to convert relative URLs to absolute URLs:

  • http://nashruddin.com/PHP_Script_for_Converting_Relative_to_Absolute_URL
  • http://nadeausoftware.com/node/79
  • http://publicmind.in/blog/urltoabsolute/
  • http://www.web-max.ca/PHP/misc_24.php

I have never tried them, but they should help you to work past this.


3 options:

  1. The image on the other site starts with http:// > use direct link
  2. Image starts with / > use home url of other site + image
  3. Image doesn't start with / > use full url + path to director of the site you are checking and add the image


./ is current directory so if you are at http://example.com and you see an image with src attribute ./hoopy_frood.png it means the whole address is http://example.com/hoopy_frood.png

../ means one directory up, so for example at http://example.com/ice_cream/sundae.html if you see an image with src attribute ../images/hoopier_is_not_a_word.gif then the image hoopier_is_not_a_word.gif is in a directory called images which is inside the site root directory along with the directory called ice_cream.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜