isset() vs strlen() - a fast/clear string length calculation
I came across this code...
if(isset($string[255])) {
// too long
}
isset() is between 6 and 40 faster than
if(strlen($string) > 255) {
// too long
}
The only drawback to the isset() is that the code is unclear - we cannot tell right away what is being done (see pekka's answer). We can wrap开发者_StackOverflow中文版 isset() within a function i.e. strlt($string,255) but we then loose the speed benefits of isset().
How can we use the faster isset() function while retaining readability of the code?
EDIT : test to show the speed http://codepad.org/ztYF0bE3
strlen() over 1000000 iterations 7.5193998813629
isset() over 1000000 iterations 0.29940009117126
EDIT2 : here's why isset() is faster
$string = 'abcdefg';
var_dump($string[2]);
Output: string(1) “c”
$string = 'abcdefg';
if (isset($string[7])){
echo $string[7].' found!';
}else{
echo 'No character found at position 7!';
}
This is faster than using strlen() because, “… calling a function is more expensive than using a language construct.” http://www.phpreferencebook.com/tips/use-isset-instead-of-strlen/
EDIT3 : I was always taught to be interested in mirco-optimisation. Probably because I was taught at a time when resources on computers were tiny. I'm open to the idea that it may not be important, there are some good arguments against it in the answers. I've started a new question exploring this... https://stackoverflow.com/questions/6983208/is-micro-optimisation-important-when-coding
OK so I ran the tests since I could hardly believe that the isset() method is faster, but yes it is, and considerably so. The isset() method is consistently about 6 times faster.
I have tried with strings of various sizes and running a varying amount of iterations; the ratios remain the same, and also the total running length by the way (for strings of varying sizes), because both isset() and strlen() are O(1) (which makes sense - isset only needs to do a lookup in a C array, and strlen() only returns the size count that is kept for the string).
I looked it up in the php source, and I think I roughly understand why. isset(), because it is not a function but a language construct, has its own opcode in the Zend VM. Therefore, it doesn't need to be looked up in the function table and it can do more specialized parameter parsing. Code is in zend_builtin_functions.c for strlen() and zend_compile.c for isset(), for those interested.
To tie this back to the original question, I don't see any issues with the isset() method from a technical point of view; but imo it is harder to read for people who are not used to the idiom. Futhermore, the isset() method will be constant in time, while the strlen() method will be O(n) when varying the amount of functions that are build into PHP. Meaning, if you build PHP and statically compile in many functions, all function calls (including strlen()) will be slower; but isset() will be constant. However this difference will in practice be negligible; I also don't know how many function pointer tables are maintained, so if user-defined functions also have an influence. I seem to remember they are in a different table and therefore are irrelevant for this case, but it's been a while since I last really worked with this.
For the rest I don't see any drawbacks to the isset() method. I don't know of other ways to get the length of a string, when not considering purposefully convoluted ones like explode+count and things like that.
Finally, I also tested your suggestion above of wrapping isset() into a function. This is slower than even the strlen() method because you need another function call, and therefore another hash table lookup. The overhead of the extra parameter (for the size to check against) is negligible; as is the copying of the string when not passed by reference.
Any speed difference in this is of absolutely no consequence. It will be a few milliseconds at best.
Use whatever style is best readable to you and anybody else working on the code - I personally would strongly vote for the second example because unlike the first one, it makes the intention (checking the length of a string) absolutely clear.
Your code is incomplete.
Here, I fixed it for you:
if(isset($string[255])) {
// something taking 1 millisecond
}
vs
if(strlen($string) > 255) {
// something taking 1 millisecond
}
Now you don't have an empty loop, but a realistic one. Lets consider it takes 1 millisecond to do something.
A modern CPU can do a lot of things in 1 millisecond - that is given. But things like a random hard drive access or a database request take multiple milliseconds - also a realistic scenario.
Now lets calculate timings again:
realistic routine + strlen() over 1000000 iterations 1007.5193998813629
realistic routine + isset() over 1000000 iterations 1000.29940009117126
See the difference?
Firstly, I want to point towards an answer by Artefacto explaining why function calls carry an overhead over language constructs.
Secondly, I want to make you aware of the fact that XDebug greatly decreases performance of function calls, so if you are running XDebug you may get convoluted numbers. Reference (Second section of question). So, in production (where you hopefully do not have XDebug installed) the difference is even smaller. It goes down from 6x to 2x.
Thirdly, you should know that, even though there is a measurable difference, this difference only shows up if this code runs in a tight loop with millions of iterations. In a normal web application the difference will not be measurable, it will go under in the noise of variance.
Fourthly, please note that nowadays development time is much more expensive than server load. A developer spending even only half a second more understanding what the isset code does is much more expensive than the saving in CPU load. Furthermore server load can be by far better saved by applying optimizations that actually make a difference (like caching).
this is the latest test:
function benchmark_function($fn,$args=null)
{
if(!function_exists($fn))
{
trigger_error("Call to undefined function $fn()",E_USER_ERROR);
}
$t = microtime(true);
$r = call_user_func_array($fn,$args);
return array("time"=>(microtime(true)-$t),"returned"=>$r,"fn"=>$fn);
}
function get_len_loop($s)
{
while($s[$i++]){}
return $i-1;
}
echo var_dump(benchmark_function("strlen","kejhkhfkewkfhkwjfjrw"))."<br>";
echo var_dump(benchmark_function("get_len_loop","kejhkhfkewkfhkwjfjrw"));
Returned results:
RUN 1:
array(3) { ["time"]=> float(2.1457672119141E-6) ["returned"]=> int(20) ["fn"]=> string(6) "strlen" } array(3) { ["time"]=> float(1.1920928955078E-5) ["returned"]=> int(20) ["fn"]=> string(12) "get_len_loop" }
RUN 2:
array(3) { ["time"]=> float(4.0531158447266E-6) ["returned"]=> int(20) ["fn"]=> string(6) "strlen" } array(3) { ["time"]=> float(1.5020370483398E-5) ["returned"]=> int(20) ["fn"]=> string(12) "get_len_loop" }
RUN 3:
array(3) { ["time"]=> float(4.0531158447266E-6) ["returned"]=> int(20) ["fn"]=> string(6) "strlen" } array(3) { ["time"]=> float(1.2874603271484E-5) ["returned"]=> int(20) ["fn"]=> string(12) "get_len_loop" }
RUN 4:
array(3) { ["time"]=> float(3.0994415283203E-6) ["returned"]=> int(20) ["fn"]=> string(6) "strlen" } array(3) { ["time"]=> float(1.3828277587891E-5) ["returned"]=> int(20) ["fn"]=> string(12) "get_len_loop" }
RUN 5:
array(3) { ["time"]=> float(5.0067901611328E-6) ["returned"]=> int(20) ["fn"]=> string(6) "strlen" } array(3) { ["time"]=> float(1.4066696166992E-5) ["returned"]=> int(20) ["fn"]=> string(12) "get_len_loop" }
The drawback are that isset is not explicit at all while strlen is really clear about what your intention are. If someone read your code and have to understand what you're doing it might bugs him and not be really clear.
Unless you are running facebook i doubt that strlen will be where your server will spend most of his resources, and you should keep using strlen.
I just tested strlen is far faster the isset.
0.01 seconds for 100000 iterations with isset
0.04 seconds for 100000 iterations with strlen
But doesn't change what i said just now.
The script as some people just asked :
$string = 'xdfksdjhfsdljkfhsdjklfhsdlkjfhsdjklfhsdkljfhsdkljfhsdljkfsdhlkfjshfljkhfsdljkfhsdkljfhsdkljfhsdklfhlkjfhkljfsdhfkljsdhfkljsdhfkljhsdfjklhsdjklfhsdkljfhklsdhfkljsdfhdjkshfjlhdskljfhsdkljfhsdjkfhsjkldhfklsdjhfkjlsfhdjkflsdhfjklfsdljfsdlkdlfkjflfkjsdfkl';
for ($i = 0; $i < 100000; $i++) {
if (strlen($string) == 255) {
// if (isset($string[255])) {
// do nothing
}
}
In modern ObjectOriented Web Applications a single line that you write within a small Class easily can be run several 100s of times to build a single Web Page.
You might want to profile your Web Site with XDebug and you might be surprised how many times each Method of a Class is executed.
Then in real world scenarios you might not work only with little strings but also with really big documents up to 3MB size or larger.
You might also come across text with non latin characters.
So eventually what was initially just a little performance loss might result in serveral 100s of milliseconds on a Web Page Rendering.
So I am very interested in this issue and wrote a little test that would test 4 different Methods to check whether a string is really empty "" or does actually contain something like "0".
function stringCheckNonEmpty0($string)
{
return (empty($string));
}
function stringCheckNonEmpty1($string)
{
return (strlen($string) > 0);
}
function stringCheckNonEmpty1_2($string)
{
return (mb_strlen($string) > 0);
}
function stringCheckNonEmpty2($string)
{
return ($string !== "");
}
function stringCheckNonEmpty3($string)
{
return (isset($string[0]));
}
I found that PHP as a hard time to work with non latin characters to I copied a russian text from a Web Page to compare the results between the tiny string "0" and the bigger russian text.
$steststring = "0"; $steststring2 = "Hotel Majestic в городе Касабланка располагается всего в нескольких минутах от " . "следующих достопримечательностей и объектов: " . "Playas Ain Diab y La Corniche и Центральный рынок Касабланки. " . "Этот отель находится вблизи следующих достопримечательностей и объектов: " . "Площадь Мухаммеда V и Культурный комплекс Сиди-Бельот.";
To see really a difference I called each test function several millions of times.
$iruncount = 10000000;
echo "test: empty(\"0\"): starting ...\n";
$tmtest = 0;
$tmteststart = microtime(true);
$tmtestend = 0;
for($irun = 0; $irun < $iruncount; $irun++)
stringCheckNonEmpty0($steststring);
$tmtestend = microtime(true);
$tmtest = $tmtestend - $tmteststart;
echo "test: empty(\"0\"): '$tmtest' s\n";
Test Results
$ php test_string_check.php
test0.1: empty("0"): starting ...
test0.1: empty("0"): '7.0262970924377' s
test0.2: empty(russian): starting ...
test0.2: empty(russian): '7.2237210273743' s
test1.1.1: strlen("0"): starting ...
test1.1.1: strlen("0"): '11.045154094696' s
test1.1.2: strlen(russian): starting ...
test1.1.2: strlen(russian): '11.106546878815' s
test1.2.1: mb_strlen("0"): starting ...
test1.2.1: mb_strlen("0"): '11.320801019669' s
test1.2.2: mb_strlen(russian): starting ...
test1.2.2: mb_strlen(russian): '23.082058906555' s
test2.1: ("0" !== ""): starting ...
test2.1: ("0" !== ""): '7.0292129516602' s
test2.2: (russian !== ""): starting ...
test2.2: (russian !== ""): '7.1041729450226' s
test3.1: isset(): starting ...
test3.1: isset(): '6.9401099681854' s
test3.2: isset(russian): starting ...
test3.2: isset(russian): '6.927631855011' s
$ php test_string_check.php
test0.1: empty("0"): starting ...
test0.1: empty("0"): '7.0895299911499' s
test0.2: empty(russian): starting ...
test0.2: empty(russian): '7.3135821819305' s
test1.1.1: strlen("0"): starting ...
test1.1.1: strlen("0"): '11.265664100647' s
test1.1.2: strlen(russian): starting ...
test1.1.2: strlen(russian): '11.282053947449' s
test1.2.1: mb_strlen("0"): starting ...
test1.2.1: mb_strlen("0"): '11.702164888382' s
test1.2.2: mb_strlen(russian): starting ...
test1.2.2: mb_strlen(russian): '23.758249998093' s
test2.1: ("0" !== ""): starting ...
test2.1: ("0" !== ""): '7.2174110412598' s
test2.2: (russian !== ""): starting ...
test2.2: (russian !== ""): '7.240779876709' s
test3.1: isset("0"): starting ...
test3.1: isset("0"): '7.2104151248932' s
test3.2: isset(russian): starting ...
test3.2: isset(russian): '7.2232971191406' s
Conclusion
- The conventional
emtpy()
Function performs well but fails on strings like "0". - The
mb_strlen()
Function which is necessary to check on texts with non latin characters performs worse on larger texts. - The Check
$string !== ""
performs very well. Even better than theempty()
Function. - But the best Performance gives the
isset($string[0])
Check.
I will definitely have to work over my whole Object Library.
精彩评论