Using DOMDocument to Parse HTML with JS code
I take HTML in as a string and then I parse it to change all href links to something else. This works however, when the HTML page has some JS script tags i.e. <script>
it gets removed! For example this line:
<script type="text/javascript" src="/js/jquery.js"></script>
Gets Changed to:
[removed][removed]
However, I would like to keep everything in. This is my function:
func开发者_开发技巧tion parse_html_code($code, $code_id){
libxml_use_internal_errors(true);
$xml = new DOMDocument();
$xml->loadHTML($code);
foreach($xml->getElementsByTagName('a') as $link) {
$link->setAttribute('href', CLK_BASE."clk.php?i=$code_id&j=" . $link->getAttribute('href'));
}
return $xml->saveHTML();
}
I appreciate any help on this.
CodeIgniter's bogus anti-XSS ‘feature’ is mauling your script's input before DOMDocument gets a look at it. Script tags and various other strings will be removed, replaced with “[removed]” other otherwise messed-about with for no good reason. See the system/libraries/Security.php module for the full embarrassing details.
To turn off this misguided feature, set $config['global_xss_filtering']= FALSE
. You'll have to make sure your script is actually handling string escaping properly, of course (eg always HTML-escaping user input when including in a page). But then you have to do that anyway; anti-XSS doesn't fix your text processing problems, it just obscures them.
$link->setAttribute('href', CLK_BASE."clk.php?i=$code_id&j=" . $link->getAttribute('href'));
You'll need to urlencode
that getAttribute('href')
(and potentially $code_id if it's not just numeric or something).
精彩评论