开发者

How to remove all tags from a Wordpress post except a child tag using DOM

I'm trying to remove everything from the following string EXCEPT the object tag:

<p>If a post is marked video, and there is text BEFORE the video, the video player does not appear! We only see the actual text for the url…</p>
<p>&nbsp;</p>
<p><object width="584" height="463"><param value="http://www.youtube.com/v/Clp9AeBdgL0?version=3" name="movie"><param value="true" name="allowFullScreen"><param value="always" name="allowscriptaccess"><embed width="584" height="463" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash" src="http:开发者_JS百科//www.youtube.com/v/Clp9AeBdgL0?version=3"></object></p>
<p>Of course, you might even have a paragraph AFTER the video. Could be lots and lots of meaningless text &ndash; we should definitely limit this. Lorem ipsum</p>

As you can see above, the third 'p' tag contains an 'object' tag. I want to get rid of everything except the 'object' tag and its contents. In other words, I'd like to traverse the DOM and remove everything except:

<object width="584" height="463"><param value="http://www.youtube.com/v/Clp9AeBdgL0?version=3" name="movie"><param value="true" name="allowFullScreen"><param value="always" name="allowscriptaccess"><embed width="584" height="463" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash" src="http://www.youtube.com/v/Clp9AeBdgL0?version=3"></object>

I was able to write a function that removed any particular tag (p, img, div, etc) and its contents from a string, by traversing the DOM, but I could NOT figure out how to preserve the contents of a child tag like in this case. Can anybody help?


Instead of traversing the DOM with XML-parsed object (which is what it sounds like you're doing, sorry if I'm incorrect), I'd suggest just using a regular-expressions type search on your string.

PHP supports PCREs

EDIT: It looks like '/<object .*<\/object>/' works. You can test PHP regex here -- I used the preg_match() function. Also, if you have multiple <object>s per page, you will want to make sure you're not using "greedy" matching. Lastly, this will not work with nested objects, although I don't expect you'll have them.

So the whole snippet might be:

$pattern = '/<object .*<\/object>/';
$subject = /* this is your string containing the html' */
$matches = array();

if(preg_match($pattern, $subject, $matches))
{
    echo $matches[0];
}
else
{
    echo "No match found."
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜