Why don't we send binary around instead of text on http?
It seems that binary would be more compact and can be deserialized in a standar开发者_高级运维d way, why is text used instead? It seems inefficient and web frameworks are forced to do nothing more than screwing around with strings. Why isn't there a binary standard? The web would be way faster and browsers would be able to load binary pages very fast.
If I were to start a binary protocol (HBP hyper binary protocol) what sort of standards would I define?
The HTTP protocol itself is readable as text. This is useful because you can telnet into any server at all and communicate with it.
Being text also allows you to easily watch HTTP communication with a program like wireshark. You can then diagnose the source of problems easily.
HTTP defines a way to work with resources
. These resources do not need to be text, they can be images, or anything else. A text resource can be sent as binary by specifying the Content-Encoding
header. Your resource type is specified via the Content-Type
header.
So your question really only applies to the HTTP protocol itself, and not the payload which is the resources.
The web would be way faster and browsers would be able to load binary pages very fast.
I don't think this is true. The slowest part is probably connection establishment and slow TCP start.
Here is an example of how an HTTP response would send a text resource with a binary representation:
HTTP/1.1 200 OK
Server: Apache/2.0
Content-Encoding: gzip
Content-Length: 1533 Content-Type: text/html; charset=ISO-8859-1
Text-based protocols have many important advantages:
- Assuming you're using UTF-8 or another octet-oriented encoding, there are no byte order issues to contend with.
- Getting everybody to agree on text-based schemas (such as those done in XML) is difficult enough. Imagine trying to get everybody to agree how many bits a number should be in the binary protocol.
- Relatedly, imagine trying to get them to agree on a floating point representation. This isn't much of a hypothetical -- IBM threatened to derail the ECMAScript 5 standardization effort over floating point representation issues.
- The web is text-based, and I don't just mean on an protocol level. Much of the content is text (at one time, almost ALL of the content was text). As such, modern programming languages have grown up around the idea that they are working with text, and that parsing binary formats is less important.
- Not too long ago, I had to generate an obscure binary format in Python to interface with a legacy system. It turned out to be much more painful than I would have imagined. Parsing it would have been far, far worse.
- A developer can't look at a stream of bytes and say "oh, my string length is missing" the way he can look at e.g. an XML document and say "oh, that element didn't get closed". This makes development and troubleshooting far easier.
- Performance is overrated, and XML parsers are "fast enough" these days. If you're doing things that really have to have every last bit of performance squeezed out of the hardware, you're almost certainly not doing anything web-based, and will probably be constructing your own binary protocol to communicate between two applications you already control.
There are binary communication standards, many of which pre-date http. I built/worked on a client/server database protocol that was binary, and it did work and was efficient byte-wise. So the question is, why did a text-format WIN in the marketplace?
I think there are probably many factors, but I believe these are the most important:
- You may not remember from pre-XML days, but byte-ordering used to be a royal headache whenever trying to exchange data. Every bit was precious, so file formats were packed as tightly as possible. But as soon as you tried to exchange files between Macs and PCs and mainframes, you realized the binary version of an integer was far from standard. Programmers spent countless hours rectifying this.
- Debugging and developing are much easier with text streams. As someone pointed out, you can use a telnet terminal session to do some of the development. Lots of times you can ignore character encoding issues. Unix's simple metaphor of pipes and streams is probably a primary reason it has been successful. It's just easier.
Well, it looks like necro-posting, but... it seems, you were predicting future. HTTP 2.0 will be binary.
It's already been proposed: http://en.wikipedia.org/wiki/Binary_XML
You can always gzip your text.
In former times it was regarded as a waste of (bandwidth) resources to encode binary as text. Not only do you have to encode and decode, but you have to be very clear about the type of binary object, and perhaps the structure, of the object you like to send around. XML sometimes gives you the illusion that this comes automatically. But it doesn't.
Text encodings have, as mentioned by Brian, as the big advantage, that you as a human can easily generate and debug them.
An interesting, non-textual format is ASN.1 (Abstract Syntax Notation No.1). Together with encoding rules (BER - Basic Encoding Rules, DER - Distinguished Encoding Rules, etc) you end up in a very, very compact encoding of binary structures, which is highly optimized for network transfer. Even the handling of different byte-sexes is defined here. ASN.1 was used and propagated in the X.-family of protocols (X.25, X.400, X.500, etc). It is still used in LDAP. There are also encoding rules defined to encode data in XML (XER - XML Encoding Rules). See http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One .
It can stay as text, whenever we, humans, interact with that data - that is - through UI views, either editing or just reading.
If software system decides to convert text to binary for compact storage, cache or transfer, it can do it, but it should be behind scenes. So it's just optimization concern. And as premature optimization is root of many problems, it can be implemented at a very late position on project road map.
精彩评论