Perl regular expression to match perl regular expression literals
Is there a specification in the form of a perl regular expression that will match all perl regular expression literals?
Failing that, is there a specification in any language for all perl regular expression literals?
Ideally, it should include regular expression modifiers like /x
and regular expression operators like s/
, but I could tack those on later.
Specifications that 开发者_运维问答match after variable interpolation are ideal, but before is fine too.
Context: I am writing in perl (Actually, using Parse::RecDescent) a metalanguage that compiles into perl, and want to identify regular expression literals and pass them on to perl.
Those operators can contain arbitrary Perl code, and there's no specification for that.
For example, in
/$x{ EXPR }/
and
s// EXPR /e
EXPR
can be almost any valid Perl expression.
However, I don't think you actually need to know how to parse it. You just need to know where it ends. And that's rather easy. Perl also needs to be able to do that before it can parse the operator, so it disallows certain code patterns. (Thus the "almost" above.)
Any occurrences of the delimiter must be preceded by an odd number of "
\
".As an exception to the above, when the delimiter is
()
,[]
or{}
, the delimiters may appear unescaped as long as they are balanced.
balanced_paren_guts : ( /(?:[^\\\(\)]|\\.)+/ | '(' balanced_paren_guts ')' )(s?)
balanced_square_guts : ( /(?:[^\\\[\]]|\\.)+/ | '[' balanced_square_guts ']' )(s?)
balanced_curly_guts : ( /(?:[^\\\{\}]|\\.)+/ | '{' balanced_curly_guts '}' )(s?)
match_op : <skip:> 'm' /\s*/ match_op_1 match_modifiers
match_op_1 : '(' <commit> balanced_paren ')'
| '[' <commit> balanced_square ']'
| '{' <commit> balanced_curly '}'
| /(?x: ([^\\]) (?:(?!\1).|\\.)* \1 )/
match_modifiers : /\w+/
subst_op : <skip:> 's' /\s*/ subst_op_1 subst_modifiers
subst_op_1 : '(' <commit> balanced_paren ')' \s* subst_op_2
| '[' <commit> balanced_square ']' \s* subst_op_2
| '{' <commit> balanced_curly '}' \s* subst_op_2
| /(?x: ([^\\]) (?:(?!\1).|\\.)* \1 (?:(?!\1).|\\.)* \1 )/
subst_op_2 : '(' <commit> balanced_paren ')'
| '[' <commit> balanced_square ']'
| '{' <commit> balanced_curly '}'
| /(?x: ([^\\]) (?:(?!\1).|\\.)* \1 )/
subst_modifiers : /\w+/
Notes:
- The rules may not correctly handle «
'
» as a delimiter. - A rule needs to be added to allow «
\
» as a delimiter, but I don't think you should support that.
You may want to look at the source code for YAPE::Regex, which is used to parse Perl regular expressions. One big caveat is that it has not been updated since perl version 5.6, which means that it does not understand any regular expression syntax introduced since then (especially 5.10).
See also perlreref
精彩评论