Using Tidy to clean HTML, HTML content is being changed, encoding problem?
I am fet开发者_StackOverflow社区ching HTML from a smarty template and need to clean it (simply want to remove extra whitespace, and format / indent the HTML nicely), I'm using tidy to do something like:
$html = $smarty->fetch('foo.tmpl');
$tidy = new tidy;
$tidy->parseString($html, array(
'hide-comments' => TRUE,
'output-xhtml' => TRUE,
'indent' => TRUE,
'wrap' => 0
));
$tidy->cleanRepair();
return $tidy;
While this works ok for english, multilingual support seems to break this. For example, I have arabic characters ok in $html, but after tidy I get back some nasty encoding:
هل أنت متأكد أنك تريد
Is there a setting in tidy that will format the HTML, but leave the HTML itself alone? I looked at this post: PHP "pretty print" HTML (not Tidy) but it's seems like this won't work since I'm grabbing my HTML from smarty.
Any suggestions appreciated.
Try using the second argument to set the encoding in parseString
http://www.php.net/manual/en/tidy.parsestring.php
$html = $smarty->fetch('foo.tmpl');
$tidy = new tidy;
$tidy->parseString($html, array(
'hide-comments' => TRUE,
'output-xhtml' => TRUE,
'indent' => TRUE,
'wrap' => 0
),
'raw');
$tidy->cleanRepair();
return $tidy;
use raw as encoding parameter
For raw, Tidy will output values above 127 without translating them into entities and all Arabic characters are above 127
精彩评论