retrieve attributes and content of multiple script tags inside the head tag with PHP
I have found a few different questions that pertain to my question, but I'm having trouble putting them together into one function.
Here is my HTML:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>
Here is the code I have right now:
$filePath = "directory/file.html";
retrieveScriptContentandAttributes($filePath);
function retrieveScriptContentandAttributes($filePath) {
$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
//var_dump($dom->loadHTMLFile($filePath));
$head = $dom->getElementsByTagName('head')->item(0);
$xp = new DOMXpath($dom);
$script = $xp->query("script", $head);
for ($row = 0; $开发者_JAVA技巧row < 5; $row++) {
echo $script->item($row)->textContent;
if ($script->item($row) instanceof DOMNode) {
if ($script->item($row)->hasAttributes()) {
foreach ($script->item($row)->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
$scriptAttr[] = array('attr'=>$name, 'value'=>$value);
}
echo $scriptAttr;
}
}
}
And the result I'm getting is "ArrayAC_FL_RunContent = 0;Array Notice: Trying to get property of non-object" on the line "echo $script->item($row)->textContent;". The odd part is, that line is executing just fine. But I need a way to get $scriptAttr to print the array like so: language=>javascript. Then again for the next script tag: src=>Scripts/AC_RunActiveContent.js, language=>javascript.
I appreciate your help!!
Try DOMXpath (See: http://php.net/manual/en/class.domxpath.php):
<?php
$dom = new DOMDocument();
$dom->loadHtml('<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>
');
$xpath = new DOMXPath($dom);
$scriptAttributes = array();
/* //head/script[@src] would only select nodes with an src attribute */
foreach ($xpath->query('//head/script') as $node) {
$attributes =& $scriptAttributes[];
foreach ($node->attributes as $name => $attribute) {
$attributes[$name] = $attribute->nodeValue;
}
}
var_dump($scriptAttributes);
Output:
array(2) {
[0]=>
array(1) {
["language"]=>
string(10) "javascript"
}
[1]=>
array(2) {
["src"]=>
string(30) "Scripts/AC_RunActiveContent.js"
["language"]=>
string(10) "javascript"
}
}
You can clean up the code somewhat be eliminating the getElementsByTagName call:
$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
$xp = new DOMXpath($dom);
$scripts = $xp->query("//head/script"); // find only script tags in the head block, ignoring scripts elsewhere
foreach($scripts as $script) {
.... your stuff here ...
}
The DOMNoteList that xpath queries return is iterable, so you can simply foreach over it, without needing to do counts/for loops. And by doing this via a direct XPath query, you don't have to check if the $script
nodes are script nodes... that's the only type of node the query results will return.
精彩评论