开发者

What is the reason for perfomance difference for substring search using "index()" vs RegEx in Perl?

I am assuming there might be an efficiency difference between:

if (index($string, "abc") < -1) {}

and

if ($string !~ /abc/) {}

Could someone confirm that this is the case based on how both are implemented in Perl (as opposed to pure benchmarking)?

I can obviously make a guess as to how both are implemented (based on how I would write both in C) but would like more informed answer ideally based on actual perl sourcecode.


Here's my own sample benchmark:

                          Rate regex.FIND_AT_END    index.FIND_AT_END
regex.FIND_AT_END     639345/s                   --                 -88%
index.FIND_AT_END    5291005/s              开发者_JAVA技巧   728%                   --
                          Rate regex.NOFIND         index.NOFIND
regex.NOFIND          685260/s                   --                 -88%
index.NOFIND         5515720/s                 705%                   --
                          Rate regex.FIND_AT_START  index.FIND_AT_START
regex.FIND_AT_START   672269/s                   --                 -90%
index.FIND_AT_START  7032349/s                 946%                   --
##############################
use Benchmark qw(:all);

my $count = 10000000;
my $re = qr/abc/o;
my %tests = (
    "NOFIND        " => "cvxcvidgds.sdfpkisd[s"
   ,"FIND_AT_END   " => "cvxcvidgds.sdfpabcd[s"
   ,"FIND_AT_START " => "abccvidgds.sdfpkisd[s"
);

foreach my $type (keys %tests) {
    my $str = $tests{$type};
    cmpthese($count, {
        "index.$type" => sub { my $idx = index($str, "abc"); },
        "regex.$type" => sub { my $idx = ($str =~ $re); }
    });
}


Take a look at the function Perl_instr:

 430 char *
 431 Perl_instr(register const char *big, register const char *little)
 432 {
 433     register I32 first;
 434 
 435     PERL_ARGS_ASSERT_INSTR;
 436 
 437     if (!little)
 438         return (char*)big;
 439     first = *little++;
 440     if (!first)
 441         return (char*)big;
 442     while (*big) {
 443         register const char *s, *x;
 444         if (*big++ != first)
 445             continue;
 446         for (x=big,s=little; *s; /**/ ) {
 447             if (!*x)
 448                 return NULL;
 449             if (*s != *x)
 450                 break;
 451             else {
 452                 s++;
 453                 x++;
 454             }
 455         }
 456         if (!*s)
 457             return (char*)(big-1);
 458     }
 459     return NULL;
 460 }

Compare with S_regmatch. It seems to me that there is some overhead in regmatch compared to index ;-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜