HTML and character encoding vs HTML Entity
When writing an HTML document, is it acceptable to use the direct special character such as the captial letter C with a cedilla underneath as regular text: Ç or to use the HTML Entity name of this charecter, Ç ?
I have seen both being used in practice, but surely there are rules governing the appropriate usage of this, as well as advantages to 开发者_如何学Goone way over another. For instance, this website maintains the raw-form of this character, but other websites may end up rendering it as a square block.
Real characters:
- Are easier to type if your system is set up for a language that uses those characters
- Produce more readable code
- Save bytes
HTML entities:
- Let you more or less forget about character encoding
Obviously, characters with special meaning in HTML (<, &, etc) still need to be represented by entities.
If you're using UTF-8 character encoding, then most entity characters (other than &, > and <) become redundant.
If you're not using UTF-8, then you need entities for everything.
It all depends on the character encoding of the document. If you're unsure of whether or not you should use the the regular text or the encoding version, you could run your page through the W3C Validator.
Consider this code:
<html>
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Stuff</title>
</head>
<body>
 <p>©</p>
 <p>©</p>
</body>
</html>
The document encoding is set to UTF-8 and when it's validated, it returns an error:
Sorry, I am unable to validate this document because on line 7 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论