PHP HTML DOM Parser Amazon offer listing pull all prices and seller names
I am trying to pull the price and seller from the amazon offer listing pages found at:
http://www.amazon.com/gp/offer-listing/B002UYSHMM
I can get the price by using:
$ret['Retail'] = $html->find('span[class="price"]', 0)->innertext;
This pulls the first price in the offer listing
I tried to pull the matching seller of the first price by using the below to get the alt value from the img which contains the seller name:
$ret['SoldBy'] = $html->find('ul.sellerInformation img', 0)->getAttribute('alt');
It worked for the first one but as I went down it started missing sellers and even missing prices in some cases.
Can anyone tell why it would miss sellers and even jump around on prices? All I did to get additional sellers is:
$ret['Retail2'] = $html->find('span[class="price"]', 1)->innertext;
$ret['SoldBy2开发者_Python百科'] = $html->find('ul.sellerInformation img', 1)->getAttribute('alt');
$ret['Retail3'] = $html->find('span[class="price"]', 2)->innertext;
$ret['SoldBy3'] = $html->find('ul.sellerInformation img', 2)->getAttribute('alt');
$ret['Retail4'] = $html->find('span[class="price"]', 3)->innertext;
$ret['SoldBy4'] = $html->find('ul.sellerInformation img', 3)->getAttribute('alt');
$ret['Retail5'] = $html->find('span[class="price"]', 4)->innertext;
$ret['SoldBy5'] = $html->find('ul.sellerInformation img', 4)->getAttribute('alt');
$ret['Retail6'] = $html->find('span[class="price"]', 5)->innertext;
$ret['SoldBy6'] = $html->find('ul.sellerInformation img', 5)->getAttribute('alt');
$ret['Retail7'] = $html->find('span[class="price"]', 6)->innertext;
$ret['SoldBy7'] = $html->find('ul.sellerInformation img', 6)->getAttribute('alt');
Thank you for any suggestions!
<?php
$url = 'http://www.amazon.com/gp/offer-listing/B0036RNK7O/ref=dp_olp_new?ie=UTF8&qid=1319582305&sr=8-2';
$dom = new DomDocument();
$content = file_get_contents($url);
$dom->loadHTML($content);
$results = array();
$classes_to_collect = array('price', 'shipping_block', 'condition', 'sellerInformation');
$seller_elements = array('name', 'rating', 'stock_info', 'item_info');
foreach($dom->getElementsByTagName('tbody') as $tb)
{
if($tb->hasAttribute('class') && stripos($tb->getAttribute('class'), 'result')!==false)
{
foreach($tb->getElementsByTagName('tr') as $tr)
{
$new_result = array();
foreach($tr->getElementsByTagName('td') as $td)
{
foreach($td->childNodes as $cne)
{
foreach($classes_to_collect as $ctc)
{
if($cne->hasAttributes() && $cne->getAttribute('class') && stripos($cne->getAttribute('class'), $ctc)!==false)
{
if($cne->localName=='ul')
{
$new_sellern = array();
$lis = $cne->getElementsByTagName('li');
foreach($lis as $lii=>$lie)
{
$value = $lie->textContent;
if($seller_elements[$lii]=='item_info')
{
$cutoff = strpos($value, 'amznJQ.onReady');
if($cutoff) $value = substr($value, 0, $cutoff);
}
else if($seller_elements[$lii]=='name')
{
$cutoff = strpos($value, 'Seller:');
if($cutoff!==false) $value = substr($value, 7);
}
else if($seller_elements[$lii]=='rating')
{
$cutoff = strpos($value, 'Seller Rating:');
if($cutoff!==false) $value = substr($value, 14);
}
$new_seller[$seller_elements[$lii]] = trim($value);
}
$new_result[$ctc] = $new_seller;
}
else $new_result[$ctc] = $cne->textContent;
}
}
}
}
$results[] = $new_result;
}
}
}
print_r($results);
Will print a huge multi-dimensional array
I used a foreach and put the results into an array. Worked much better since the number of sellers varies by item.
foreach($html->find('div.resultsset table tbody.result tr') as $article) {
if($article->find('span.price', 0)) {
// get retail
$item['Retail'] = $article->find('span.price', 0)->plaintext;
// get soldby
if($article->find('img', 0)->getAttribute('alt') <> '') {
$item['SoldBy'] = $article->find('img', 0)->getAttribute('alt'); }
else {$item['SoldBy'] = $article->find('ul.sellerInformation li a b', 0)->plaintext;}
$ret[] = $item;
}
}
精彩评论