开发者

Using Tidy to clean HTML, HTML content is being changed, encoding problem?

I am fet开发者_StackOverflow社区ching HTML from a smarty template and need to clean it (simply want to remove extra whitespace, and format / indent the HTML nicely), I'm using tidy to do something like:


$html = $smarty->fetch('foo.tmpl');

$tidy = new tidy;
$tidy->parseString($html, array(
    'hide-comments' => TRUE,
    'output-xhtml' => TRUE,
    'indent' => TRUE,
    'wrap' => 0
));
$tidy->cleanRepair();
return $tidy;

While this works ok for english, multilingual support seems to break this. For example, I have arabic characters ok in $html, but after tidy I get back some nasty encoding:

هل أنت متأكد أنك تريد

Is there a setting in tidy that will format the HTML, but leave the HTML itself alone? I looked at this post: PHP "pretty print" HTML (not Tidy) but it's seems like this won't work since I'm grabbing my HTML from smarty.

Any suggestions appreciated.


Try using the second argument to set the encoding in parseString

http://www.php.net/manual/en/tidy.parsestring.php


$html = $smarty->fetch('foo.tmpl');

$tidy = new tidy;
$tidy->parseString($html, array(
    'hide-comments' => TRUE,
    'output-xhtml' => TRUE,
    'indent' => TRUE,
    'wrap' => 0
            ),
'raw');
$tidy->cleanRepair();
return $tidy;

use raw as encoding parameter
For raw, Tidy will output values above 127 without translating them into entities and all Arabic characters are above 127

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜