开发者

HTML compression

Most web pages are filled with significant amounts of whitespace and other useless characters which result in wasted bandwidth for both the client and server. This is especially true with large pages containing complex table structures and CSS styles defined at the level. It seems like good practice to preprocess all your HTML files before publishing, as this will save a lot of bandwidth, and where I live, bandwidth aint cheap.

It goes without saying that the optimisation should not affect the appearance of the page in any way (According to the HTML standard), or break any embedded Javascript or backend ASP code, etc.

Some of the functions I'd like to perform are:

  • Removal of all whitespace and carriage returns. The parser needs to be smart enough to not strip whitespace from inside string literals. Removal of space between HTML elements or attributes is mostly safe, but iirc browsers will render the single space between div or span tags, so these shouldn't be stripped.
  • Remove all comments from HTML and client side scripts
  • Remove redundant attribute values. e.g. <option selected="selected"> can be replaced with <option selected>

As if this wasn't enough, I'd like to take it even farther and compress the CSS styles too. Pages with large tables often contain huge amounts of code like the following: <td style="TdInnerStyleBlaBlaBla">. The page would be smaller if the style label was small. e.g. <td style="x">. To this end, it would be great to have a tool that could rename all your styles to identifiers comprised of the least number of characters possible. If there are too many styles to represent with the set of allowable single digit identifiers, then it would be necessary to move to larger identifiers, prioritising the smaller identifiers for the styles which are used the most.

In theory it should be quite easy to build a piece of soft开发者_运维知识库ware to do all this, as there are many XML parsers available to do the heavy lifting. Surely someone's already created a tool which can do all these things and is reliable enough to use on real life projects. Does anyone here have experience with doing this?


The term you're probably after is 'minify' or 'minification'.

This is very similar to an existing conversation which you may find helpfull:

https://stackoverflow.com/questions/728260/html-minification

Also, depending on the web server you use and the browser used to look at your site, it is likely that your server is already compressing data without you having to do anything:

http://en.wikipedia.org/wiki/HTTP_compression


your 3 points are actually called "Minimizing HTML/JS/CSS"

Can have a look these:

  • HTML online minimizer/compressor?
  • http://tidy.sourceforge.net/

I have done some compression HTML/JS/CSS too, in my personal distributed crawler. which use gzip, bzip2, or 7zip

  • gzip = fastest, ~12-25% original filesize
  • bzip2 = normal, ~10-20% original filesize
  • 7zip = slow, ~7-15% original filesize
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜