What is the speed impact of gzip on files for HTTP transfer?
I know that gzipping files before sending them across the network saves bandwidth, and for static files that can be cached, it is not a significant 开发者_运维百科impact on server-side CPU usage.
But what about the client? They have to gunzip whatever files are sent, which will take CPU time. Additionally, I'm worried that the entire file must be received and gunzipped before any parsing can take place.
This strikes me as odd because I see two scenarios:
1) client has fast internet --> gzip is relevant
2) client has slow internet --> gzip prevents partial parsing
Clearly the exact speed-up (or slow-down?) will depend on exact circumstances of the files being transferred and the client. However, I'm curious what the time cost (or how can I measure the cost) on the client-side?
They have to gunzip whatever files are sent, which will take CPU time.
Perhaps, but the CPU time spent on decompression is extremely small compared to all the other things going on when loading a page (parsing, styling, rendering, scripting).
I'm worried that the entire file must be received and gunzipped before any parsing can take place.
Don't worry, gzip is a "stream" of data and the complete file is not required to begin decompression/parsing.
Specifically I want to know how I can gauge how much time is lost because of gzipping.
Here is an interesting article where the author performs the type of test you're describing. The tools are available for download so that you can perform the same tests in your own environment.
The author concludes:
I guess there are very few cases where you shouldn’t use gzip your content. If your typical page is less than 100 bytes then gzipping it could hurt the client’s and the server’s performance. But no website —except maybe a few web-services— serves pages with a typical size of 100 bytes or less. So there’s no excuse for serving uncompressed HTML.
In the mean time (question is a bit old already) most people are using TLS for every connection anyway, so questions about performance have become a bit superfluous. But it's still worthwhile to look at this:
1) client has fast internet --> gzip is relevant
2) client has slow internet --> gzip prevents partial parsing
The opposite is the case. The slower the client's internet connection (or route to the server) the more advantage you get out of gzip compression (or compression in general).
Compression is helpful if the time it takes to compress/decompress plus the time it takes to transmit the compressed data is less than the time it takes to transmit the uncompressed data right away.
Gzip will typically reduce your data to somewhere between 1/3 and 1/2 of its original size (depending on what it is), and compression runs at about 50MB/s (+/- 5). Decompression is about 3 times as fast.
100MBit ethernet has a throughput of about 12.5MB/s, and most people do not yet have 100MBit internet access (which, since it normally stacks on top of ATM, is slower than normal ethernet, too). Also, most people most of the time are not able to completely saturate their high bandwidth internet connection with a single download.
So, realistically, for a normal average user and a server that is not in your local area network at home, but "somewhere else", let's say you get 5MB/s (which is about twice the theoretical maxmimum that I have here, btw).
To transmit a 50kB file, you thus need 0.01 seconds. gzip compression adds about 0.001 seconds to compress, and 0.0003 seconds to decompress (let's round up and say 0.002 total), but you only have to transmit 16kB, which takes 0.0032 seconds.
Add them together, transfer with gzip compression is about twice as fast.
Now of course, eventually (when the average user will have like 200Mbit/s internet, and servers have 100Gbit/s uplinks) this figure will turn around.
Update (couple of years later):
Today, I stumbled across the Squash Compression Benchmark site, which shows nice graphs of a variety of compression algorithms (gzip among them) measured on different computers. It also includes a transfer+processing calculator, which demonstrates my claim: The slower the link, the more it's worthwhile to compress.
You can see immediately that for slower transfer speeds (I chose 250kB/s as an example), compression is big win. Compression/decompression time is almost insignificant compared to what you save in transferring stuff over the wire.
However, the advantage diminuishes as the transfer speed approaches the same order of magnitude as compression/decompression speeds. For a "typical, not awesome" desktop computer, depending on what compression algorithm is used, the break even point will be somewhere between 10 MB/s and 100 MB/s, which roughly corresponds to what theoretical maximum throughput you get on a DSL-100 or a Gigabit-fiber internet connection, respectively. For any link below 100 Mbit/s, it's a pretty safe bet that compression is "win" whereas above that it's more like "it depends", and upwards of Gigabit speeds, it is most certainly "fail".
Note how the speed of your computer has a huge influence on what's true and what's a lie in terms of "compression is worth it". What's true for your desktop gaming rig may not be true for your low-cost mobile phone.
To elaborate: All screenshots but the very last are for the "E-Desktop Intel Core i3" machine (a typical "no special, not awesome" desktop computer), whereas the very last is for "Raspberry 2" (a not-quite-terrible, but still low-power ARM mini-computer). While you can certainly, without any doubt, say that at 10 MB/s (i.e. a 100Mbit link) it is always big win to compress on the Core i3 (cutting overall time in less than half), on the Raspberry there exist some fast compressors which arguably are worth doing it, but zlib compression is definitively worse than not compressing at all.
On the Raspberry, it isn't quite so clear at 100 Mbit/s:
精彩评论