开发者

Script to copy links from a lot of pages [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I need copy links from a lot of pages, from the same site. looks like: /download.php?id=xxxxx Just need add 1 more in the id to have the needed pages... On those pages, i need take a link inside the code like: href="http://www.site.com/xxxxxxxxxxxx" (x as a variable)

开发者_运维问答

It's possible? Thanks


Do not use REGEX to parse HTML

Perhaps the biggest mistake people make when trying to get URLs or link text from a web page is trying to do it using regular expressions. The job can be done with regular expressions, however, there is a high overhead in having preg loop over the entire document many times. The correct way, and the faster, and infinitely cooler ways is to use DOM. By using DOM in the getLinks functions it is simple to create an array containing all the links on a web page as keys, and the link names as values. This array can then be looped over like any array and a list created, or manipulated in any way desired. Note that error suppression is used when loading the HTML. This is to suppress warnings about invalid HTML entities that are not defined in the DOCTYPE. But of course, in a production environment, error reporting would be disabled and error reporting set to none.

<?php
    function getLinks($link){
        $ret = array();

        /*** a new dom object ***/
        $dom = new domDocument;

        /*** get the HTML via FGC, 
        Tho prefer using cURL instead but that's out of scope of the question..
       (@suppress those errors) ***/
        @$dom->loadHTML(file_get_contents($link));

        /*** remove silly white space ***/
        $dom->preserveWhiteSpace = false;

        /*** get the links from the HTML ***/
        $links = $dom->getElementsByTagName('a');

        /*** loop over the links ***/
        foreach ($links as $tag){
            /*** only add download links to the return array ***/
            if(strpos($tag->getAttribute('href'),'/download.php?id=')!=false){
                 $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;
            }
        }
        return $ret;
    }
?>

Example Usage

<?php
    /*** a link to search ***/
    $link = "http://www.site.com";

    /*** get the links ***/
    $urls = getLinks($link);

    /*** check for results ***/
    if(sizeof($urls) > 0){
        foreach($urls as $key=>$value){
            echo $key . ' - '. $value . ' - ' . str_ireplace('http://www.site.com/download.php?id=','',$key). '<br >';
        }
    }else{
        echo "No links found at $link";
    }
?>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜