开发者

retrieve attributes and content of multiple script tags inside the head tag with PHP

I have found a few different questions that pertain to my question, but I'm having trouble putting them together into one function.

Here is my HTML:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>

Here is the code I have right now:

$filePath = "directory/file.html";
retrieveScriptContentandAttributes($filePath);

function retrieveScriptContentandAttributes($filePath) {
$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
//var_dump($dom->loadHTMLFile($filePath));
$head = $dom->getElementsByTagName('head')->item(0);
$xp = new DOMXpath($dom);
$script = $xp->query("script", $head);

for ($row = 0; $开发者_JAVA技巧row < 5; $row++) {
    echo $script->item($row)->textContent;

    if ($script->item($row) instanceof DOMNode) {
        if ($script->item($row)->hasAttributes()) {
            foreach ($script->item($row)->attributes as $attr) {
                $name = $attr->nodeName;
                $value = $attr->nodeValue;
                $scriptAttr[] = array('attr'=>$name, 'value'=>$value);
            }
            echo $scriptAttr;
        }
    }
}

And the result I'm getting is "ArrayAC_FL_RunContent = 0;Array Notice: Trying to get property of non-object" on the line "echo $script->item($row)->textContent;". The odd part is, that line is executing just fine. But I need a way to get $scriptAttr to print the array like so: language=>javascript. Then again for the next script tag: src=>Scripts/AC_RunActiveContent.js, language=>javascript.

I appreciate your help!!


Try DOMXpath (See: http://php.net/manual/en/class.domxpath.php):

<?php
$dom = new DOMDocument();
$dom->loadHtml('<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>
');

$xpath = new DOMXPath($dom);

$scriptAttributes = array();

/* //head/script[@src] would only select nodes with an src attribute */
foreach ($xpath->query('//head/script') as $node) {
    $attributes =& $scriptAttributes[];
    foreach ($node->attributes as $name => $attribute) {
        $attributes[$name] = $attribute->nodeValue;
    }
}

var_dump($scriptAttributes);

Output:

array(2) {
  [0]=>
  array(1) {
    ["language"]=>
    string(10) "javascript"
  }
  [1]=>
  array(2) {
    ["src"]=>
    string(30) "Scripts/AC_RunActiveContent.js"
    ["language"]=>
    string(10) "javascript"
  }
}


You can clean up the code somewhat be eliminating the getElementsByTagName call:

$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
$xp = new DOMXpath($dom);

$scripts = $xp->query("//head/script"); // find only script tags in the head block, ignoring scripts elsewhere

foreach($scripts as $script) {
    .... your stuff here ...
}

The DOMNoteList that xpath queries return is iterable, so you can simply foreach over it, without needing to do counts/for loops. And by doing this via a direct XPath query, you don't have to check if the $script nodes are script nodes... that's the only type of node the query results will return.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜