开发者

why does a <script> tag stop domdocument() parsing?

In the following code, the seemingly innoc开发者_高级运维uous introduction of a script tag containing an empty div causes parsing to fail. (Using an empty script tag causes no problem.) $html1 gets parsed properly, retrieving the values of the two spans:

Array
(
    [0] => test1
    [1] => test2
)

whereas $html2 does not get parsed properly, retrieving only the span preceding the script tag:

Array
(
    [0] => test1
)

Why does this happen? With errors turned on I get two errors, "Unexpected end tag : script" and "Unexpected end tag : div" but I do not know why these are unexpected.

<?php

$html1 = <<<EOT


<div class="productList"> 

    <span>test1</span>

    <div></div>

    <span>test2</span>

</div>

EOT;

$html2 = <<<EOT

<div class="productList"> 

    <span>test1</span>

    <script> 

        <div></div>

    </script> 

    <span>test2</span>

</div>

EOT;

libxml_use_internal_errors(true);

$dom = new DOMDocument();
$dom->loadhtml($html1);
$xpath = new DOMXPath($dom);

$titles_nodeList = $xpath->query('//div[@class="productList"]/span');

foreach ($titles_nodeList as $title) {
    $titles[] = $title->nodeValue;
}

echo("<p>titles without script tag and div</p>");
echo("<pre>");
print_r($titles);
echo("</pre>");

unset($titles);

$dom->loadhtml($html2);
$xpath = new DOMXPath($dom);

$titles_nodeList = $xpath->query('//div[@class="productList"]/span');

foreach ($titles_nodeList as $title) {
    $titles[] = $title->nodeValue;
}

echo("<p>titles with script tag and div</p>");
echo("<pre>");
print_r($titles);
echo("</pre>");

?>


A div doesn't belong inside a script tag. Javascript belongs inside a script tag.

Take the div out of the script tag and it should be fine.


The trick is simple, change loadHTML to loadXML with one condition,
the HTML string has to be always well-formed

$dom->loadXML($html2);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜