How do I create something like a negated character class with a string instead of characters?
I am trying to write a tokenizer for Mustache in Perl. I can easily handle most of the tokens like this:
#!/usr/bin/perl
use strict;
use warnings;
my $comment = qr/ \G \{\{ ! (?<comment> .+? ) }} /xs;
my $variable = qr/ \G \{\{ (?<variable> .+? ) }} /xs;
my $text = qr/ \G (?<text> .+? ) (?= \{\{ | \z ) /xs;
my $tokens = qr/ $comment | $variable | $text /x;
my $s = do { local $/; <DATA> };
while ($s =~ /$tokens/g) {
my ($type) = keys %+;
(my $contents = $+{$type}) =~ s/\n/\\n/;
print "type [$type] contents [$contents]\n";
}
__DATA__
{{!this is a comment}}
Hi {{name}}, I like {{thing}}.
But I am running into trouble with the Set Delimiters directive:
#!/usr/bin/perl
use strict;
use warnings;
my $delimiters = qr/ \G \{\{ (?<start> .+? ) = [ ] = (?<end> .+?) }} /xs;
my $comment = qr/ \G \{\{ ! (?<comment> .+? ) }} /xs;
my $variable = qr/ \G \{\{ (?<variable> .+? ) }} /xs;
my $text = qr/ \G (?<text> .+? ) (?= \{\{ | \z ) /xs;
my $tokens = qr/ $comment | $delimiters | $variable | $text /x;
my $s = do { local $/; <DATA> };
while ($s =~ /$tokens/g) {
for my $type (keys %+) {
(my $contents = $+{$type}) =~ s/\n/\\n/;
print "type [$type] contents [$contents]\n";
}
}
__DATA__
{{!this is a comment}}
Hi {{name}}, I like {{thing}}.
{{(= =)}}
If I change it to
my $delimiters = qr/ \G \{\{ (?<start> [^{]+? ) = [ ] = (?<end> .+?) }} /xs;
It works fine, but the point of the Set Delimiters directive is to change the delimiters, so the code will wind up looking like
my $variable = qr/ \G $start (?<variable> .+? ) $end /xs;
And it is perfectly valid to say {{{== ==}}}
(i.e. change the delimiters to {=
and =}
). What I want, but maybe not what I need, is the ability to say something like (?:not starting string)+?
. I figure I am just going to have to give up being clean about it and drop code into the regex to force it to match only what I want. I am trying to avoid that for four reasons:
- I don't think it is very clean.
- It is marked as experimental.
- I am not very familier with it (I think it comes down to
(?{CODE})
and returning special values. - I am hoping someone knows some other exotic feature that I am not familiar with that fits the situation better (e.g.
(?(condition)yes-pattern|no-pattern)
).
Just to make things clear (I hope), I am trying to match a constant length starting delimiter followed by the shortest string that allows a match and does not contain the starting delimiter followed 开发者_如何学Goby a space followed by an equals sign followed by the shortest string that allows a match that ends with the ending delimiter.
Use a negative look-ahead assertion. Something like this:
my $variable = qr/ \G $start (?<variable> (.(?!$end))+ ) $end /xs;
For those who are curious, what follows is the full tokenizer for Mustache written in Perl 5.10 style. Now I just need to write the parser and the renderer.
#!/usr/bin/perl
use 5.010_000;
use strict;
use warnings;
sub gen_tokenizer {
my ($s, $e) = @_;
my ($start, $end) = map { quotemeta } $s, $e;
my $unescaped = "$s $e" eq "{{ }}" ?
qr/ \G \{{3} (?<unescaped> .+?) }{3} /xs :
qr{ \G $start & (?<unescaped> .+? ) $end }xs;
return qr{
$unescaped |
\G $start (?:
! (?<comment> .+? ) |
> (?<partial> .+? ) |
\# (?<enum_start> .+? ) |
/ (?<enum_stop> .+? ) |
(?<start> (?: . (?! $end ) )+? ) = [ ] = (?<end> .+? ) |
(?<variable> .+? )
) $end |
(?<text> .+? ) (?= $start | \z )
}xs;
}
my $template = do { local $/; <DATA> };
my $tokenizer = gen_tokenizer "{{", "}}";
while ($template =~ /$tokenizer/g) {
my @types = keys %+;
if (@types == 1) {
my $type = $types[0];
(my $contents = $+{$type}) =~ s/\n/\\n/g;
say "$type: [$contents]";
} else {
$tokenizer = gen_tokenizer $+{start}, $+{end};
say "set_delim: [$+{start} $+{end}]";
}
}
__DATA__
{{!this is a comment}}
{{{html header}}}
Hi {{name}}, I like {{thing}}.
{{(= =)}}
(#optional)
This will only print if optional is set
(/optional)
(&html footer)
精彩评论