开发者

how to parse the meta tag in the webpage [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

CodeIgniter: A Class/Library to help get meta tags from a web page?

can any body write a simple prog for retreiving the out put as found or not found for metatags,alltags,robots.txt file

开发者_如何学编程
<?php 
$url = 'example.com'; 
$meta = '<meta http-equiv="Content-type" content="text/html; charset=utf-8" />'; 
$contents = file_get_contents($url); 
if(strpos($contents, $meta) !== false) 
{ 
    echo 'found'; 
} 
else 
{ 
    echo 'not found'; 
}

?>


You can:

  1. Use file_get_contents to retrieve raw HTML data

  2. Tidy the HTML code to make it more readable; if Tidy is not installed on your web server:

    apt-get install php5-tidy

  3. Parse the ellement with DOMDocument


function get_meta($url)
{
    // Get & Tidy HTML
    $tidy = new tidy();
    $tidy->parseFile($url, array("output-html" => true));
    $tidy->cleanRepair();
    // Parse XML
    $xml = new DOMDocument();
    $xml->loadHTML($tidy);
    $meta_tags = $xml->getElementsByTagName("meta");
    // Put meta informations in an array
    $meta = array();
    foreach($meta_tags as $meta_tag)
    {
        $key = $meta_tag->hasAttribute("http-equiv") ? $meta_tag->getAttribute("http-equiv") : $meta_tag->getAttribute("name");
        $value = $meta_tag->hasAttribute("content") ? $meta_tag->getAttribute("content") : $meta_tag->getAttribute("value");
        $meta[$key] = $value;
    }
    return $meta;
}

print_r(get_meta("http://php.net/manual/fr/tidy.cleanrepair.php"));
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜