Is the character "&" alone illegal in HTML 4.01 Strict documents?
I always saw statements to not use &
in HTML documents by itself, and use &
instead.
So I tried putting &
in the title and in the content of the page, but they validate:
http://topics2look.com/code-examples/HTML/ampersand-by-itself-can-validate.html
Is &
by itself legal in HTML 4开发者_StackOverflow社区.01 Strict documents?
The W3C HTML 4.01 Strict Charset section says
Four character entity references deserve special mention since they are frequently used to escape special characters:
* "<" represents the < sign. * ">" represents the > sign. * "&" represents the & sign. * "" represents the " mark.
Authors wishing to put the "<" character in text should use "<" (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter). Similarly, authors should use ">" (ASCII decimal 62) in text instead of ">" to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values.
Authors should use "&" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&" in attribute values since character references are allowed within CDATA attribute values.
As it uses the word "should" instead of "must", I guess you can skip it and still validate.
But don't do that, because it will sometimes render oddly.
I actually had to escape a couple of the ampersands in my cut-paste of this quote to get SO to render the character entity text... ;-)
Whether it's an error or not in HTML 4 depends on whether it's an error or not in SGML. I can't check that since the spec is not publicly accessible. The HTML 4 spec does imply it's not an error («Authors should use "&"» uses "should", not "must").
It's an error in HTML 4's XML serialisation (XHTML) for it is an error in XML («CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)»)
It's not an error in HTML 5's HTML serialialisation. («Not a character reference. No characters are consumed, and nothing is returned. (This is not an error, either.)»)
It's an error in HTML 5's XML serialisation for it's invalid in XML («CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)»)
Yes it is legal.
It's legal in HTML 4.01, and is also legal in the HTML 4 Strict doctype, because it is not one of the deprecated features such as the FONT tag.
It is not legal in any version of XHTML. The reason is that by definition XHTML must be XML compliant and the unescaped ampersand has a special meaning in the spec (for entities).
When possible it's preferable to use XHTML because it's a tighter, more modern spec, and more info can be found here http://en.wikipedia.org/wiki/XHTML. Usually HTML is used for legacy support.
I understand HTML is still used for practical reasons in many places, and in this case it is considered a best practice to use the escaped version, even though it is in fact legal in your doctype.
An ampersand "by itself" is called an "unescaped ampersand" if you ever want to search more on the topic.
I’ve researched this thoroughly and wrote about my findings here: http://mathiasbynens.be/notes/ambiguous-ampersands
I’ve also created an online tool that you can use to check your markup for ambiguous ampersands or character references that don’t end with a semicolon, both of which are invalid. (No HTML validator currently does this correctly.)
精彩评论