What is the reason for perfomance difference for substring search using "index()" vs RegEx in Perl?
I am assuming there might be an efficiency difference between:
if (index($string, "abc") < -1) {}
and
if ($string !~ /abc/) {}
Could someone confirm that this is the case based on how both are implemented in Perl (as opposed to pure benchmarking)?
I can obviously make a guess as to how both are implemented (based on how I would write both in C) but would like more informed answer ideally based on actual perl
sourcecode.
Here's my own sample benchmark:
Rate regex.FIND_AT_END index.FIND_AT_END regex.FIND_AT_END 639345/s -- -88% index.FIND_AT_END 5291005/s 开发者_JAVA技巧 728% -- Rate regex.NOFIND index.NOFIND regex.NOFIND 685260/s -- -88% index.NOFIND 5515720/s 705% -- Rate regex.FIND_AT_START index.FIND_AT_START regex.FIND_AT_START 672269/s -- -90% index.FIND_AT_START 7032349/s 946% --
##############################
use Benchmark qw(:all);
my $count = 10000000;
my $re = qr/abc/o;
my %tests = (
"NOFIND " => "cvxcvidgds.sdfpkisd[s"
,"FIND_AT_END " => "cvxcvidgds.sdfpabcd[s"
,"FIND_AT_START " => "abccvidgds.sdfpkisd[s"
);
foreach my $type (keys %tests) {
my $str = $tests{$type};
cmpthese($count, {
"index.$type" => sub { my $idx = index($str, "abc"); },
"regex.$type" => sub { my $idx = ($str =~ $re); }
});
}
Take a look at the function Perl_instr
:
430 char *
431 Perl_instr(register const char *big, register const char *little)
432 {
433 register I32 first;
434
435 PERL_ARGS_ASSERT_INSTR;
436
437 if (!little)
438 return (char*)big;
439 first = *little++;
440 if (!first)
441 return (char*)big;
442 while (*big) {
443 register const char *s, *x;
444 if (*big++ != first)
445 continue;
446 for (x=big,s=little; *s; /**/ ) {
447 if (!*x)
448 return NULL;
449 if (*s != *x)
450 break;
451 else {
452 s++;
453 x++;
454 }
455 }
456 if (!*s)
457 return (char*)(big-1);
458 }
459 return NULL;
460 }
Compare with S_regmatch. It seems to me that there is some overhead in regmatch
compared to index
;-)
精彩评论