开发者

how to handle data integrity problems with domdocument?

Given a se开发者_开发百科ries of elements of the form

<td class="name">Product Name</td>
<td class="price">$10.00</td>

one can use domdocument() to parse a page containing, say, 100 name/price pairs into a group of 100 names and a separate group of 100 prices. However, if one of the prices is missing, you get a group of 100 names, and a group of 99 prices, and it's not clear which product is missing its price.

Using a regex to parse pairs of name/price data (making the price optional) makes it possible to identify which product lacks a price, as the result is 100 pairs, one of which has an empty price value. Is there some way to achieve this using domdocument(), so that it is not necessary to use regex to parse html?

EDIT: I tried dqhendricks' suggestion but I am getting a syntax error on the foreach loop with the following

<?php

$html = <<<EOT

<table>
    <tr>
       <td class="productname">a</td>
       <td class="price">1</td>
    </tr>

    <tr>
       <td class="productname">b</td>
       <td class="price">2</td>
    </tr>

    <tr>
       <td class="productname">c</td>
       <td class="price">3</td>
    </tr>

    <tr>
       <td class="productname">d</td>
       <td class="price">4</td>
    </tr>

    <tr>
       <td class="productname">e</td>
       <td class="price">5</td>
    </tr>
</table>

EOT;

libxml_use_internal_errors(true);

$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//table/tr/') as $node) {
    $name = $node->query('td[@class="productname"]');
    $price= $node->query('td[@class="price"]');
}

print_r($node);

?>


with this structure, wouldn't you be iterating through td elements and checking their class attributes? and if there are two name attributes in a row, you know that the first one was missing a price?

where is your parsing code? I imagine the problem is in there. are you just using xpath to get a list of products and a list of prices or what?

now if your html document is structured like this:

<tr>
   <td class="productname">x</td>
   <td class="price">x</td>
</tr>

you will want to iterate through the tr elements and check their contents, and you will easily know which products are missing a price.

EDIT:

 foreach ($xpath->query('//table/tr/') as $node) {
    $name = $node->query('td[@class="name"]');
    $price= $node->query('td[@class="price"]');
 }

Something like that anyways...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜