开发者

Handling HTML character encoding issues

-Think this is called character encoding but please re-title if I'm wrong-

Issue: Trying to consume HTML with phpquery and maintain the html's integrity after it runs through the phpquer开发者_如何学JAVAy functions.

These are the changes to the HTML as it runs through the functions:

  1. Original HTML: <strong> Fast & Strong I Concrete</strong>

  2. HTML Page Converted to PHPQueryObject: <strong> Fast& Strong I&Acirc;&nbsp;Concrete</strong>

  3. PHPQueryObject run through Find() function: <strong> Fast & Strong IÂ Concrete</strong>

Tried various combinations of htmlentities(), html_entity_decode(), iconv() to handle the movement of the data and maintain the original structure without displaying a bunch of unnecessary characters. I think this is a limitation of phpquery’s ability to consume html, so I need a work around.

I’ve been successful removing the  and other unneeded characters by using iconv("UTF-8", "BIG5//IGNORE") but it is somewhat destructive to the original html since it’s intended for Traditional Chinese Characters.

Question: What are &Acirc; and &nbsp; and how can I handle them so the consumed html #2 and #3 above display as originally intended #1 above without displaying extra characters to the browser?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜