Regular expression /(ab)?use/: Is a more complex expression worth it?
I'm writing a simple Perl script that translates assembly instruction strings to 32-bit 开发者_运维百科binary code.
I decided to handle translation grouping instruction by type (ADD
and SUB
are R-Type instructions and so on...) so in my code I'm doing something like this:
my $bin = &r_type($instruction) if $instruction =~ /^(?:add|s(?:ub|lt|gt))\s/;
because I want to handle add
, sub
, slt
and sgt
in the same way.
I realized however that maybe using that regular expression could be an 'overkill' for the task I'm supposed to do... could the pattern
/^(?:add|sub|slt|sgt)\s/
represent a better use of regular expressions in this case?
Thanks a lot.
Unless you are using a perl older than 5.10, the simple alternation will perform better anyway (see here), so there is no reason to try to optimize it.
Instead of placing the mnemonics buried inside regular expressions, build a dispatch table using a hash. It will be at least equally faster and your code far easier to follow:
my %emitter = (add => \&r_type,
sub => \&r_type,
slt => \&r_type,
sgt => \&r_type,
...);
if ($instruction =~ /^(\S+)/) {
my $emitter = $emitter{$1} // die "bad instruction $instruction";
$emitter->($1, $istruction);
}
else {
# error?...
}
I like salva's dispatch table (I show a lot of that in Mastering Perl), but I'll answer another aspect of the question in case you need that answer for a different problem someday.
When you want to build some alternations, some of which might be nested, you can use something like Regexp::Trie to build the alternation for you so you don't look at the ugly regex syntax:
use Regexp::Trie;
my $rt = Regexp::Trie->new;
foreach ( qw/add sub slt sgt/ ) {
$rt->add($_);
}
print $rt->regexp, "\n";
That gives you:
(?-xism:(?:add|s(?:gt|lt|ub)))
This way, you list the opcodes like Jonathan suggested, but also get the alternation. As ysth noted, you might get this for free with Perl now anyway.
Your second version is simpler, more readable, and more maintainable. The performance difference will depend on the regex implementation, but I suspect the nested version will run slower due to its increased complexity.
Yes it's overkill.
精彩评论