开发者

HTML Character Encoding

When outputting HTML content from a database, some 开发者_高级运维encoded characters are being properly interpreted by the browser while others are not.

For example, %20 properly becomes a space, but %AE does not become the registered trademark symbol.

Am I missing some sort of content encoding specifier?

(note: I cannot realistically change the content to, for example, ® as I do not have control over the input editor's generated markup)


%AE is not valid for HTML safe ASCII, You can view the table here: http://www.ascii.cl/htmlcodes.htm

It looks like you are dealing with Windows Word encoding (windows-1252?? something like that) it really will NOT convert to html safe, unless you do some sort of translation in the middle.


The byte AE is the ISO-8859-1 representation for the registered trademark. If you don't see anything, then apparently the URL decoder is using other charset to URL-decode it. In for example UTF-8, this byte does not represent any valid character.

To fix this, you need to URL-decode it using ISO-8859-1, or to convert the existing data to be URL-encoded using UTF-8.

That said, you should not confuse HTML(XML) encoding like ® with URL encoding like %AE.


The '%20' encoding is URL encoding. It's only useful for URLs, not for displaying HTML.

If you want to display the reg character in an HTML page, you have two options: Either use an HTML entity, or transmit your page as UTF-8.

If you do decide to use the entity code, it's fairly simple to convert them en-masse, since you can use numeric entities; you don't have to use the named entities -- ie use ® rather than &#reg;.

If you need to know entity codes for every character, I find this cheat-sheet very helpful: http://www.evotech.net/blog/2007/04/named-html-entities-in-numeric-order/


What server side language are you using? Check for a URL Decode function.


If you are using php you can use urldecode() but you should be careful about + characters.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜