开发者

getting all of the image absolute path in a page?

I am trying to get the src of all of the images in a page. But some pages use absolute paths and some do not. So I am wondering whats the best way to do this?

right now I am using this.

$imgsrc_regex = '#<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1#im';

preg_match_all($imgsrc_regex, $html, $matches);

For example webpage a might have the images as src="xyz.png" while others might use src="b.com/xyz.png" so is there a way to a开发者_StackOverflowutomatically append the url when necessary?


The best way (imo) would be to use DOMDocument and DOMXPath to get the URLs:

$dom=new domDocument;
$dom->loadHTML($html);

and

$xpath = new DOMXPath($dom);
$result = $xpath->query("//img/@src");

Using regex to parse HTML is bad.

Or you have to clarify your question what you really want. Do you only want to get the image URLs that are absolute? If so, you can check whether they start with http::

$result = $xpath->query("//img[starts-with(@src, 'http:') or starts-with(@src, 'HTTP:')]/@src");


Use a HTML Parser, not a regular expression

Seriously, searching for tags in HTML is the wrong problem domain for a regular expression.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜