开发者

Weird error using PHP Simple HTML DOM parser

I am using this library (PHP Simple HTML DOM parser) to parse a link, here's the code:

function getSemanticRelevantKeywords($keyword){
    $results = array();
    $html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=");
    foreach($html->find('span') as $e){
            $results[] = $e->plaintext;
    }
    return $results;
}

but I 开发者_运维问答am getting this error when I output the results:

Fatal error: Call to a member function find() on a non-object in /var/www/vhosts/efamous.de/subdomains/sandbox/httpdocs/getNewTrusts.php on line 25

(line 25 is the foreach loop), the odd thing is that it outputs everything (at least seemingly) correctly but I still get that error and can't figure out why.


The reason for this error is: the simple HTML DOM does not return the object if the size of the response from url is greater than 600000.
You can void it by changing the simple_html_dom.php file. Remove strlen($contents) > MAX_FILE_SIZE from the if condition of the file_get_html function.
This will solve your issue.


You just need to increase CONSTANT MAX_FILE_SIZE in file simple_html_dom.php.

For example:

define('MAX_FILE_SIZE', 999999999999999);


This error usually means that $html isn't an object.

It's odd that you say this seems to work. What happens if you output $html? I'd imagine that the url isn't available and that $html is null.

Edit: Looks like this may be an error in the parser. Someone has submitted a bug and added a check in his code as a workaround.


Before file_get_html/load_file method, you should first check if URL exists or not.

If the URL exists, you pass one step.
(Some servers, service a 404 page a valid HTML page. which has propriate HTML page structure like body, head, etc. But it has only text "This page couldn'!t find. 404 error bla bla..)

If URL is 200-OK, then you should check whether fetched thing is object and whether nodes are set.

That's the code i used in my pages.

function url_exists($url){
    if ((strpos($url, "http")) === false) $url = "http://" . $url;
    $headers = @get_headers($url);
    // print_r($headers);
    if (is_array($headers)){
        if(strpos($headers[0], '404 Not Found'))
            return false;
        else
            return true;    
    }         
    else
        return false;
}

$pageAddress='http://www.google.com';
if ( url_exists($pageAddress) ) {
    $htmlPage->load_file( $pageAddress );
} else {
    echo 'url doesn t exist, i stop';
    return;
}

if( $htmlPage && is_object($htmlPage) && isset($htmlPage->nodes) )
{
    // do your work here...
} else {
    echo 'fetched page is not ok, i stop';
    return;
}


For those arriving here via a search engine (as I did), after reading the info (and linked bug-report) above, I started some code-prodding and ended up fixing my problems with 2 extra checks after loading the dom;

$html = file_get_html('<your url here>');
// first check if $html->find exists
if (method_exists($html,"find")) {
     // then check if the html element exists to avoid trying to parse non-html
     if ($html->find('html')) {
          // and only then start searching (and manipulating) the dom 
     }
}


I'm having the same error come up in my logs and apart from the solutions mentioned above, it could also be that there is no 'span' in the document. I get the same error when searching for divs with a particular class that doesn't exist on the page, but when searching for something that I know exists on the page, the error doesn't pop up.


your script is OK. I receive this error when it doase not find the element that i'm looking for on that page.

In your case, please check if the page that you are accessing it has 'SPAN' element


Simplest solution to this problem

if ($html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=") {

} else {
    // do something else because couldn't find html
}


Error means, the find() function is either not defined yet or not available. Make sure you have loaded or include related function.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜