charset-utf8 and character entities

2023-01-20 12:57 问答作者：

I am proposing to convert my windows-1252 XHTML web pages to UTF-8.

I have the following character entities in my coding:

' — apostrophe,
► — right pointer,
◄ — left pointer.

If I change the charset and save the pages as UTF-8 using my editor:

the apostrophe remains in as a character entity;
the pointers are converted to symbols within the code (presumably because the entitie开发者_如何学JAVAs are not supported in UTF-8?).

Questions:

If I understand UTF-8 correctly, you don't need to use the entities and can type characters directly into the code. In which case is it safe for me to replace #39 with a typed in apostrophe?
Is it correct that the editor has placed the pointer symbols directly into my code and will these be displayed reliably on modern browsers, it seems to be ok? Presumably, I can't revert to the entities anyway, if I use UTF-8?

Thanks.

It's charset, not chartset.

1) it depends on where the apostrophe is used, it's a valid ASCII character as well so depending on the characters intention (wether its for display only (inside a DOMText node) or used in code) you may or may not be able to use a literal apostrophe.

2) if your editor is a modern editor, it will be using utf sequences instead of just char to display text. most of the sequences used in code are just plain ASCII (and ASCII is a subset of utf8) so those characters will take up one byte. other characters may take up two, three or even four bytes in a specialized manner. they will still be displayed to you as one character, but the relation between character and byte has become different.

Anyway; since all valid ASCII characters are exactly the same in ASCII, utf8 and even windows-1252. you should not see any problems using utf8. And you can still use numeric and named entities because they are written in those valid characters. You just don't have to.

P.S. All modern browsers can do utf8 just fine. but our definitions of "modern" may vary.

Entities have three purposes: Encoding characters it isn't possible to encode in the character encoding used (not relevant with UTF-8), encoding characters it is not convenient to type on a given keyboard, and encoding characters that are illegal unescaped.

► should always produce ► no matter what the encoding. If it doesn't, it's a bug elsewhere.

► directly in the source is fine in UTF-8. You can do either that or the entity, and it makes no difference.

' is fine in most contexts, but not some. The following are both allowed:

<span title="Jon's example">This is Jon's example</span>

But would have to be encoded in:

<span title='Jon&#x27;s example'>This is Jon's example</span>

because otherwise it would be taken as the ' that ends the attribute value.

Use entities if you copy/paste content from a word processor or if the code is an XML dialect. Use a macro in your text-editor to find/replace the common ones in one shot. Here is a simple list:

Half: ½ => ½
Acute Accent: é => é
Ampersand: & => &
Apostrophe: ’ => '
Backtick: ‘ => `
Backslash: \ => \
Bullet: • => •
Dollar Sign: $ => $
Cents Sign: ¢ => ¢
Ellipsis: … => …
Emdash: — => —
Endash: – => –
Left Quote: “ => “
Right Quote: ” => ”

References

XML Entity Names

继续阅读：editor encoding html-entities utf-8

charset-utf8 and character entities

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？