How can I extract the links from a page of HTML?

2023-02-03 01:10 问答作者：

I am trying to download a file in php.开发者_运维知识库

$file = file_get_contents($url);

How should i download the contents of the links within the file in $url...

This requires parsing HTML, which is quite a challenge in PHP. To save you a lot of trouble, download an HTML parsing library, such as PHPQuery (http://code.google.com/p/phpquery/). Then you'll have to select all the links with pq('a'), loop through them getting their href attribute values, and for each one, convert it from relative to absolute and run a file_get_contents on the resulting URL. Hopefully these pointers should get you started.

So you want to find all URLs in a given file? RegEx to the rescue... and some sample code below which should do what you want:

$file = file_get_contents($url);
if (!$file) return;
$file = addslashes($file);

//extract the hyperlinks from the file via regex
preg_match_all("/http:\/\/[A-Z0-9_\-\.\/\?\#\=\&]*/i", $file, $urlmatches);

//if there are any URLs to be found
if (count($urlmatches)) {
    $urlmatches = $urlmatches[0];
    //count number of URLs
    $numberofmatches = count($matches);
    echo "Found $numberofmatches URLs in $url\n";

    //write all found URLs line by line
    foreach($urlmatches as $urlmatch) {
        echo "URL: $urlmatch...\n";
    }
}

EDIT: When I understand your question correctly, you now want to download the contents of the found URLs. You would do that in the foreach loop calling file_get_contents for each URL, but you probably want to do some filtering beforehand (like don't download images etc.).

You'll need to parse the resulting HTML string, either manually, or via a 3rd party plugin.

HTML Scraping in Php

继续阅读：php

How can I extract the links from a page of HTML?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？