开发者

Text from <p> tag using DOM Php

Hey, Consider i have the follwing html syntax

<p>xyz</p>
<p>abc</p>

I want to retrieve the text (xyz and abc) using DOM.

This is my code.

<?php
$link='http://www.xyz.com';
$ret= getLinks($link);
print_r ($ret);

function getLinks($link)
{
    /*** return array ***/
    $ret = array();

    /*** a new dom object ***/
    $开发者_开发百科dom = new domDocument;

    /*** get the HTML (suppress errors) ***/
    @$dom->loadHTML(file_get_contents($link));

    /*** remove silly white space ***/
    $dom->preserveWhiteSpace = false;

    /*** get the links from the HTML ***/
    $text = $dom->getElementsByTagName('p');

/*** loop over the links ***/
    foreach ($text as $tag)
    {
        $ret[] = $tag->innerHTML;
    }

    return $ret;
}
?>

But i get an empty result. wat am i miissing here.?


To suppress parsing errors, do not use

@$dom->loadHTML(file_get_contents($link));

but

libxml_use_internal_errors(TRUE);

Also, there is no reason to use file_get_contents. DOM can load from remote resources.

libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile($link);
libxml_clear_errors();

Also, Tag Names are case sensitive. You are querying for <P> when the snippet contains <p>. Change to

$text = $dom->getElementsByTagName('p');

And finally, there is no innerHTML. A userland solution to fetch it is in

  • How to get innerHTML of DOMNode?

You can fetch the outerHTML with

$ret[] = $dom->saveHtml($tag); // requires PHP 5.3.6+

or

$ret[] = $dom->saveXml($tag); // that will make it XML compliant though

To get the text content of the P tag, use

$ret[] = $tag->nodeValue;


First, case matters:

$dom->getElementsByTagName('P');

Should be:

$dom->getElementsByTagName('p');

Second, innerHTML is not a valid DOMElement property.

Try:

echo $dom->textContent;
echo $dom->nodeValue;

However, this won't return the inner HTML tags and will strip them. There are a few examples on how to make it work in the PHP manual.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜