my site crawler died while it's running
I wrote a site crawler to get links and images to create site map but it killed while running! so it's not my whole class
class pageCrawler {
.......
private $links = array();
public function 开发者_开发问答__construct ( $url ) {
ignore_user_abort ( true );
set_time_limit ( 0 );
register_shutdown_function ( array ( $this, 'callRegisteredShutdown' ) );
$this->host = $urlParts [ 'host' ];
$this->crawlingUrl ( $url );
$this->doCrawlLinks ();
}
$this->crawlingUrl ( $url )
:
at beginning main address set to this method (e.g http://www.mysite.com)
getUrl()
: connect to url by fsockopen then get url contents
findLinks()
: return a href
and img src
and then store returns links in $this->links[]
then i echo something to flush output and insert following code after that :
echo str_pad ( " ", 5000 );
flush ();
$this->doCrawlLinks()
:
it's check $this->links
and do same process that i describe in top for first element of $this->links
doCrawlLinks()
run and get url content of first element then shift first element of $this->links
till $this->links
get empty
it's general trend of my class it's work but suddenly it's crashed suddenly. i set set_time_limit(0)
to do forever but my process dosent't finish because my shoutdoown function dosent execute ! i confused where is my problem
Wild guess - do you have a recursion in doCrawlLinks() ? Deep recursion can simply crash process. Or it can crash by memory limit per process.
From my experience, it is very helpfull to keep the list of links in database with pending/processed flag on them, so you can shutdown and resume your crawler any time you want (or in your case - resume it after crash).
精彩评论