Why have browsers always been liberal in syntax they accept? [closed]
开发者_StackOverflow
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this questionAs far back as I remember ~1997, web browsers have been very literal in the HTML syntax they accept for example missing closing tags, mismatch encoding, absence of doctypes, single vs double quotes, even a single closed by a double works in some cases, case sensitivitity etc.. I appreciate that some of this is part of the spec but from what I gather a lot is not.
Why, especially when the average computer might of been a 486 DX2 around 1997 (I appreciate latest would of been a Pentium) where processing and memory were scarce did browser manufacturers burden themselves with the adding parsing processing required to handle bad HTML?
Why didnt we just start off strict from the beginning? the web publisher would of seen his problem before publishing and would not of been an issue.
What have been the advantages in accepting bad HTML?
As someone who previously worked for a screen scraping company I can tell you it was very annoying..
Many believe that if markup languages were more like compiled programming code all the ills of today (for an appropriate list of "ills of today") would disappear. That is not correct, it is not even correct for programming languages. This kind of checking catches syntax errors, and syntax errors, though annoying when discovered, are the easiest ones to fix.
You can see this with the experience of XHTML, which is HTML with draconian error checking turned on. You can say that it is a limited success in the sense that if you take a random sample of XHTML documents and a random sample of HTML documents the code quality of the former is higher. But if you take away the "XHTML" documents that claim to be "XHTML" but aren't (that is they are not valid XHTML and often not even well-formed XML) the sample sizes are very different. (This is a longer story, too long for this post.) Historically liberal parsing was not the problem, though divergent parsing was part of the problem (This too belongs to the longer story.)
Of course syntax matters, if you throw in a or tag at a random place in a document, it is likely going to change in manners that can be hard to predict. Even so markup languages were intended to mark up text, not to be used as a programming code. They are supposed to be easy and they are supposed to fall back to best possible interpretation. In the cases where markup language is used for programming, as in web applications, the programming environment might want to put in checks.
Which returns us to that good syntax does not protect us from bad logic or bad habits. Take the deeply nested table. In the early days before we invented yet more wasteful habits, the nested table was by far the most CPU-consuming process. It also cost the most programming time by the browser vendors as tables never were satisfactorily defined and each one had to reverse engineer what the others were doing, bugs and all. Yet nested tables are valid XHTML and well-formed XML (well, legacy tables weren't, but that had neglible consequences for processing time, and moderate consequences for browser development effort).
The final test is that an XHTML document universally is more time-consuming to process and display than the equivalent HTML document. The reason is that the small gains in simpler processing is overshadowed by the extra processing and constraints by XHTML, including that draconian error checking. In the very early days, before the NS-IE war, this is pretty much what happened. There were attempts at making "proper" browsers, including one based on SGML, which HTML officially was an application of. Now SGML was a beast but in any case the SGML attempt was so slow as to be completely unusable at the web of that time.
Some folks experimented with strict XML parsers for web a few years back. Webpages delivered with the application/xml
content type would not render in IE if they had a single error anywhere in the document. It was a mess. People who used it turned it off.
The purpose of a browser is to serve the user, not to enforce standards for web page developers. "Be liberal about what you accept and strict about what you generate" is a good maxim for programming in general, (except when security is a concern).
Besides, who would use a browser that couldn't see half of the web?
This is the principle of Generous on input, strict on output, and is hardly limited to web browsers. Mail readers do it with mal-formed, missing, redundant, or otherwise bogus message headers. Servers of all kinds do it with input they accept from users.
I might also point out that parsing HTML is far from the most CPU-intensive part of displaying a web page. Rendering a web page is far more CPU-intensive--and that part is just as "difficult" with strictly written HTML. In 1997, and before, the rendering was the part that suffered from performance (I can attest to this, as someone who used to browse the web on a 386 way back in the day). Of course back then many people had slow (i.e. 14.4k) Internet connections, and in those cases, neither HTML parsing nor rendering were the bottlenecks. And it's still true today that most CPUs are way faster at parsing/rendering most content, than networks are at delivering data.
精彩评论