开发者

Remove HTML using PHP (ob_start + dom parser)

I need to learn how to remove html tags using PHP.

This is what I have in mind (I think DOM phrasing is what I need but I cant figure out how it works. A working example would be a big help for me. I can't install any external library’s and I am running PHP 5):

function the_remove_function($remove){

//  dom parser code here?

return $remove;}

// return all content into a string
ob_start('the_remove_function');

Example code:

<body>
<div class="a"></div>
<div id="b"><p class="c">Here are some text and HTML</p></div>
<div id="d"></div>
</body>

Questions:

1) How do I return:

<body>
<p class="c">Here are some text and HTML</p>
</body>

2) How do I return:

<body>
<div class="a"></div>
<div id="b"></div>
<div id="d"></div>
</body>

3) How do I return:

<body>
<div class="a"></div>
<p class="c">Here are some text and HTML</p>
<div id="d"></div>
</body>

Next example code:

<head>
<meta http-equiv="Content-Type开发者_开发百科" content="text/html; charset=UTF-8" />
<link rel='stylesheet' id='test-css'  href='http://www.domain.com/css/test.css?ver=2011' type='text/css' media='all' />
<script type='text/javascript' src='http://www.domain.com/js/test.js?ver=2010123'></script>
</head>

4) How do I return:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<link rel='stylesheet' id='test-css'  href='http://www.domain.com/css/test.css?ver=2011' type='text/css' media='all' />
</head>

5) How do I return:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<script type='text/javascript' src='http://www.domain.com/js/test.js?ver=2010123'></script>
</head>

Thanks for reading :)


Try the HTML Purifier library. It does exactly what you need and has extensive documentation on how to create filters. If you want to filter because of security reasons, then by all means use it - it has a parser that can cope with the most crazy XSS schemes imaginable.


You can use all the DOM classes of PHP, you will the doc here : http://fr2.php.net/manual/en/book.dom.php and I'm sur you can find a lot of tutorial in you prefer.

Here is an exemple for your second case :

<?php
$content = '<body><div class="a"></div><div id="b"><p class="c">Here are some text and HTML</p></div><div id="d"></div></body>';
$doc = new DOMDocument();
$doc->loadXML($content);

//Get your p element
$p = $doc->getElementsByTagName('p')->item(0);
//Remove the p tag from the DOM
$p->parentNode->removeChild($p);

//Save you new DOM tree
$html = $doc->saveXML();

echo $html;
//If you want to delete the first line
echo substr($html, strpos($html, "\n"));


Try to use :

strip_tags();

function in php.

Sample Usage:

    <?php
    $str = '<body>
            <div class="a"></div>
            <div id="b"><p class="c">Here are some text and HTML</p></div>
            <div id="d"></div>
            </body>
           ';
    echo strip_tags($str);
    echo "\n";
    ?>

it will return :

Here are some text and HTML 

or:

    <?php
     $str = '<body>
             <div class="a"></div>
             <div id="b"><p class="c">Here are some text and HTML</p></div>
             <div id="d"></div>
             </body>
            ';
     echo strip_tags($str, '<body>');
     echo "\n";
    ?>

this will allow '<body>' tag and will remve another tags. result :

<body>
Here are some text and HTML
</body>

More Examples:Php.Net

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜