fastest way to remove whitespace from a rendered PHP file
I tried a performance check tool "DOM Monster" to analyze my php site. There is one information which says "50% of nodes are whitespace-only text nodes". Ok I unterstand the problem but what is the fastest way to cleanup whitespace in php?
I think a good start is to use the "Output Control" like ob_start() and开发者_如何学Go then replace the whitespace before releasing it with ob_end_flush(). In the moment I do everything with echo echo ... I never read much about this ob_* things is it useful?
I guess using preg_replace() is a performance killer for this job or? So what is the best practice for this?
The fastest way to remove whitespace-only nodes is to not create them in the first place. Just remove all the whitespace immediately before and after each HTML tag.
You certainly could remove the spaces from your code after the fact using an output handler (look at the callback
bit in ob_start), but if your goal is performance, then that kind of defeats the purpose.
A whitespace-only node is in the DOM tree parsed by the browser when it reads your HTML. It's where there's an HTML tag, then nothing but whitespace, then another HTML tag. It's a waste of browser resources, but not a huge deal.
The function trim() will solve your problem, isn't it?
http://www.php.net/manual/en/function.trim.php
Well, I guess you talk about HTML, and HTML is as is a meta language full of whitespace (attributes, texts). By the way, you probably use newlines for readability.
I rather advise you to compress your page with deflate/gzip and webserver rules, ie an .htaccess rule:
<FilesMatch "\\.(js|css|html|htm|php|xml)$">
SetOutputFilter DEFLATE
</FilesMatch>
You can also take a look at Tidy which is a library to help you to check and cleanup your HTML code.
preg_replace will of course slow things down a little. But probably it's the fastest way anyway. The problem is more that preg_replace may be unreliable because it is very hard to write regular expression that works on all possible cases. If you are createing XML/XHTML output, you could parse all your data using a fast stream parser SAX or StAX, php has both builtin usually, and then write the data back to the output without the whitespaces. That's simple, effective, reliable und at least medium fast. It's still not going to blow you off with speed.
Another option would be to just use gzip. (ob_handler('gz_handler') is the call in php if I remember correctly). This will compress your data and compression works extremely well on problems with data that repeats a lot within a document. That come with a litte performance penalty as well, but the reduced size of the output document may make up for it. Though beware that the output will not be send to the browser before all output is available. This makes partial loading of webpages much harder ;-).
The problem with using ob_* and then trimming whitespace is that you’ll have to make sure to not remove displayed whitespace like in <pre>
tags or <textarea>
s etc. You’ll need a syntactical parser which understands where it should not trim.
With an (performance-)expensive parser you should also cache output where possible.
The following is code to remove all space characters but the first of a sequence of spaces. So 1 space will be kept, 3 spaces pruned to 1, etc.
at the top of you php file do
ob_start();
At the end do
function StripExtraSpace($s)
{
$newstr = "";
for($i = 0; $i < strlen($s); $i++)
{
$newstr = $newstr . substr($s, $i, 1);
if(substr($s, $i, 1) == ' ')
while(substr($s, $i + 1, 1) == ' ')
$i++;
}
return $newstr;
}
$content = ob_get_clean();
echo StripExtraSpace($content);
精彩评论