开发者

sequential strpos() faster than a function with one preg_match?

i need to test if any of the strings 'hello', 'i am', 'dumb' exist in the longer string called $ohreally, if even one of them exists my test is over, and i have the knowledge that neither of the others will occur if one of them has.

Under these conditions I am asking for your help on the most efficient way to write this searc开发者_开发技巧h,

strpos() 3 times like this?

if (strpos ($ohreally, 'hello')){return false;}  
   else if (strpos ($ohreally, 'i am')){return false;}  
   else if (strpos ($ohreally, 'dumb')){return false;}  
   else {return true;}

or one preg_match?

if (preg_match('hello'||'i am'||'dumb', $ohreally)) {return false}   
   else {return true};

I know the preg_match code is wrong, i would really appreciate if someone could offer the correct version of it.

Thank You!


Answer

Please read what cletus said and the test middaparka did bellow. I also did a mirco time test, on various strings, long and short. with these results

IF, you know the probability of the string values occurring ORDER them from most probable to least. (I did not notice a presentable different in ordering the regex itself i.e. between /hello|i am|dumb/ or /i am|dumb|hello/.

On the other hand in sequential strpos the probability makes all the difference. For example if 'hello' happens 90%, 'i am' 7% and 'dumb' 3 percent of the time. you would like to organize your code to check for 'hello' first and exit the function as soon as possible.

my microtime tests show this.

for haystacks A, B, and C in which the needle is found respectively on the first, second, and third strpos() execution, the times are as follows,

strpos:

A: 0.00450 seconds // 1 strpos()

B: 0.00911 seconds // 2 strpos()

C: 0.00833 seconds // 3 strpos()

C: 0.01180 seconds // 4 strpos() added one extra

and for preg_match:

A: 0.01919 seconds // 1 preg_match()

B: 0.02252 seconds // 1 preg_match()

C: 0.01060 seconds // 1 preg_match()

as the numbers show, strpos is faster up to the 4rth execution, so i will be using it instead since i have only 3, sub-stings to check for : )


The correct syntax is:

preg_match('/hello|i am|dumb/', $ohreally);

I doubt there's much in it either way but it wouldn't surprise me if the strpos() method is faster depending on the number of strings you're searching for. The performance of strpos() will degrade as the number of search terms increases. The regex probably will to but not as fast.

Obviously regular expressions are more powerful. For example if you wanted to match the word "dumb" but not "dumber" then that's easily done with:

preg_match('/\b(hello|i am|dumb)\b/', $ohreally);

which is a lot harder to do with strpos().

Note: technically \b is a zero-width word boundary. "Zero-width" means it doesn't consume any part of the input string and word boundary means it matches the start of the string, the end of the string, a transition from word (digits, letters or underscore) characters to non-word characters or a transition from non-word to word characters. Very useful.

Edit: it's also worth noting that your usage of strpos() is incorrect (but lots of people make this same mistake). Namely:

if (strpos ($ohreally, 'hello')) {
  ...
}

will not enter the condition block if the needle is at position 0 in the string. The correct usage is:

if (strpos ($ohreally, 'hello') !== false) {
  ...
}

because of type juggling. Otherwise 0 is converted to false.


Crazy idea, but why not test both 'n' thousand times in two separate loops, both surrounded by microtime(); and the associated debug output.

Based on the above code (with a few corrections) for 1,000 iterations, I get something like:

strpos test:     0.003315
preg_match test: 0.014241

As such, in this instance (with the limitations outlined by others) strpos indeed seems faster, albeit by a largely meaningless amount. (The joy of pointless micro-optimisation, etc.)

Never estimate what you can measure.


It depends on the number of strings you want to look for and the length of the string you are searching.

You'd need to experiment with a representative data set to find out which is true (repeat the operation, say 1000 times and measure the time delay).

BTW - I think the regex you are looking for is '(hello|i am|dumb)'

Also, your code is more verbose than it needs to be:

return strpos($ohreally, 'hello') || strpos($ohreally, 'i am') || strpos($ohreally, 'dumb');

or

return preg_match('(hello|i am|dumb)',$ohreally);

Also, by all the usual coding standards, there should not be a space between the function name and the bracket.

C.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜