HTML DOM: How to get elements without losing children?
I'm trying to perform a preg_replace on the text in an HTML string. I want to avoid replacing the text within tags, so I'm loading the string as a DOM element and grabbing the text within each node. For example, I have this list:
<ul>
<li><a href="?p=oconnorinv&i=1">Boxes 1-3</a>: 1925 - 1928 <em>(A-Ma)</em></li>
<li><a href="?p=oconnorinv&i=2">Boxes 4-6</a>: 1928 <em>(Mb-Z)</em> - 1930 <em>(A-Wi)</em></li>
<li><a href="?p=oconnorinv&i=3">Boxes 7-9</a>: 1930 <em>(Wo-Z)</em>- 1932 <em>(A-Fl)</em></li>
</ul>
I want to be able to highlight the character "1", or the letter "i", without disturbing the links or list item tag. So I grab each list item and get its value to perform the replace on:
$invfile = [string of the unordered list above]
$invco开发者_JS百科ntents = new DOMDocument;
$invcontents->loadHTML($invfile);
$inv_listitems = $invcontents->getElementsByTagName('li');
foreach ($inv_listitems as $f) {
$f->nodeValue = preg_replace($to_highlight, "<span class=\"highlight\">$0</span>", $f->nodeValue);
}
echo html_entity_decode($invcontents->saveHTML());
The problem is, when I grab the node values, the child nodes inside the list item are lost. If I print out the original string as-is, the < a >, < em >, etc. tags are all there. But when I run the script, it prints out without the links or any formatting tags. For example, if my $to_replace is the string "Boxes", the list becomes:
<ul>
<li><span class="highlight">Boxes</span> 1-3: 1925 - 1928 (A-Ma)</li>
<li><span class="highlight">Boxes</span> 4-6: 1928 (Mb-Z) - 1930 (A-Wi)</li>
<li><span class="highlight">Boxes</span> 7-9: 1930 (Wo-Z)- 1932 (A-Fl)</li>
</ul>
How can I get the text without losing the tags inside?
The problem here is that you're operating on the entire
If the structure above is always the same you can do something like
$new_html = preg_replace("##", "", $f->item(0)->nodeValue);
In reality, the best way to go about it is to unset the anchor's node value and create an entirely new element and append it.
(Consider this psuedo code)
$inv_listitems = $invcontents->getElementsByTagName('li');
foreach ($inv_listitems as $f) {
$span = $invcontents->createElement("span");
$span->setAttribute("class", "highlight");
$span->nodeValue = $f->item(0)->nodeValue;
$f->appendChild($span);
}
echo $invcontents->saveHTML();
You'll have to do some matching in there, as well as unsetting the nodeValue of $f but hopefully this makes it a little more clear.
Also, don't set HTML in nodeValue directly, because it will run htmlentities() against all of the html you set. That is why I create a new element above. If you absolutely have to set HTML in nodeValue then you should create a DocumentFragment Object
YOu're better of operating only on the textnodes:
$x = new DOMXPath(invcontents);
foreach($x->query('//li/text()' as $textnode){
//replace text node with list of plain text nodes & your highlighting span.
}
I always use xpath for this kind of actions. It'll give you more flexibility. This example handles
<mainlevel>
<toplevel>
<detaillevel key=...>
<xmlvalue1></xmlvalue1>
<xmlvalue1></xmlvalue2>
<sublevel key=...>
<xmlvalue1></xmlsubvalue1>
<xmlvalue1></xmlsubvalue2>
</sublevel>
</detaillevel>
</toplevel>
</mainlevel>
To parse this:
$xpath = new DOMXPath($xmlDoc);
$mainNodes = $xpath->query("/mainlevel/toplevel/detaillevel");
foreach( $mainNodes as $subNode ) {
$parameter1=$subNode->getAttribute('key');
$parameter2=$subNode->getElementsByTagName("xmlvalue1")->item(0)->nodeValue;
$parameter3=$subNode->getElementsByTagName("xmlvalue2")->item(0)->nodeValue;
foreach ($subNode->getElementsByTagName("sublevel") as $detailNode) {
$parameter1=$detailNode->getAttribute('key');
$parameter2=$detailNode->getAttribute('xmlsubvalue1');
$parameter2=$detailNode->getAttribute('xmlsubvalue2');
}
}
精彩评论