php strange looping problem
Sorry for the long code, I'm really losing it.
This code is supposed to get a list of urls through POST, in a textarea with breaklines between each url. The script should download each url, go through the html and take some links, then go in those links, get some data and echo it out.
For some reason, visually it looks as if I'm running getDetails()
only once, as I'm getting only one set of results.
I have checked multiple times if the foreach
loop takes each url separately and that part is working
Can anyone spot the problem?
require_once('simple_html_dom.php');
function getDetails($html) {
$dom = new simple_html_dom;
$dom->load($html);
$title = $dom->find('h1', 0)->find('a', 0);
foreach($dom->find('span[style="color:#333333"]') as $element) {
$address = $element->innertext;
}
$address = str_replace("<br>"," ",$address);
$address = str_replace(","," ",$address);
$title->innertext = str_replace(","," ",$title->innertext);
if ($address == "") {
$exp = explode("<strong><strong>",$html);
$exp2 = explode("</strong>",$exp[1]);
$address = $exp2[0];
}
echo $title->innertext . "," . $address . "<br>";
}
function getHtml($Url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT开发者_JAVA百科_URL, $Url);
curl_setopt($ch, CURLOPT_REFERER, "http://www.google.com/");
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
function getdd($u) {
$html = getHtml($u);
$dom = new simple_html_dom;
$dom->load($html);
foreach($dom->find('a') as $element) {
if (strstr($element->href,"display_one.asp")) {
$durls[] = $element->href;
}
}
return $durls;
}
if (isset($_POST['url'])) {
$urls = explode("\n",$_POST['url']);
foreach ($urls as $u) {
$durls2 = getdd($u);
$durls2 = array_unique($durls2);
foreach ($durls2 as $durl) {
$d = getHtml("http://www.example.co.il/" . $durl);
getDetails($d);
}
}
}
You're only assigning the last element in the loop, it looks like. You'll need to concatenate. Something like $address .= $element->innertext;
inside the loop (note the .= instead of =).
edit: unless I'm mistaking what it's supposed to be doing. I think I may've been focusing on the wrong part of the code.
When you use DOMDocument on html you load it with $dom->loadHTMLFile()
or $dom->loadHTML()
you should also call libxml_use_internal_errors(true)
before hand so that it will not crash because of improperly formatted html.
精彩评论