Why was this regex calling substcont an excessive number of times?
This is more out of curiosity than anything else, as I'm failing to find any useful info on Google about this function (CORE::substcont)
In profiling and optimising some old, slow, XML parsing code I've found that the following regex is calling substcont 31 times for each time the line is executed, and taking a huge amount of time:
Calls: 10000 Time: 2.65s Sub calls: 320000 Time in subs: 1.15s`
$handle =~s/(>)\s*(<)/$1\n$2/g;
# spe开发者_运维知识库nt 1.09s making 310000 calls to main::CORE:substcont, avg 4µs/call
# spent 58.8ms making 10000 calls to main::CORE:subst, avg 6µs/call
Compared to the immediately preceding line:
Calls: 10000 Time: 371ms Sub calls: 30000 Time in subs: 221ms
$handle =~s/(.*)\s*(<\?)/$1\n$2/g;
# spent 136ms making 10000 calls to main::CORE:subst, avg 14µs/call
# spent 84.6ms making 20000 calls to main::CORE:substcont, avg 4µs/call
The number of substcont calls is quite surprising, especially seeing as I would've thought that the second regex would be more expensive. This is, obviously, why profiling is a Good Thing ;-)
I've subsequently changed both these line to remove the unneccessary backrefs, with dramatic results for the badly-behaving line:
Calls:10000 Time: 393ms Sub calls: 10000 Time in subs: 341ms
$handle =~s/>\s*</>\n</g;
# spent 341ms making 10000 calls to main::CORE:subst, avg 34µs/call
- So, my question is - why should the original have been making SO many calls to substcont, and what does substcont even do in the regex engine that takes so long?
substcont
is Perl's internal name for the "substitution iterator". Something to do with s///
. Based on what little information I have, it seems substcont
is triggered when doing a backref. That is, when $1
is present. You can play with it a bit using B::Concise.
Here's the opcodes of a simple regex without a backref.
$ perl -MO=Concise,-exec -we'$foo = "foo"; $foo =~ s/(foo)/bar/ig'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const[PV "foo"] s
4 <#> gvsv[*foo] s
5 <2> sassign vKS/2
6 <;> nextstate(main 1 -e:1) v:{
7 <#> gvsv[*foo] s
8 <$> const[PV "bar"] s
9 </> subst(/"(foo)"/) vKS
a <@> leave[1 ref] vKP/REFC
-e syntax OK
And one with.
$ perl -MO=Concise,-exec -we'$foo = "foo"; $foo =~ s/(foo)/$1/ig'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const[PV "foo"] s
4 <#> gvsv[*foo] s
5 <2> sassign vKS/2
6 <;> nextstate(main 1 -e:1) v:{
7 <#> gvsv[*foo] s
8 </> subst(/"(foo)"/ replstart->9) vKS
9 <#> gvsv[*1] s
a <|> substcont(other->8) sK/1
b <@> leave[1 ref] vKP/REFC
-e syntax OK
That's all I can offer. You may want to try Rx, mjd's old regex debugger.
精彩评论