开发者

Perl regular expression to match perl regular expression literals

Is there a specification in the form of a perl regular expression that will match all perl regular expression literals?

Failing that, is there a specification in any language for all perl regular expression literals?

Ideally, it should include regular expression modifiers like /x and regular expression operators like s/, but I could tack those on later.

Specifications that 开发者_运维问答match after variable interpolation are ideal, but before is fine too.

Context: I am writing in perl (Actually, using Parse::RecDescent) a metalanguage that compiles into perl, and want to identify regular expression literals and pass them on to perl.


Those operators can contain arbitrary Perl code, and there's no specification for that.

For example, in

/$x{ EXPR }/

and

s// EXPR /e

EXPR can be almost any valid Perl expression.

However, I don't think you actually need to know how to parse it. You just need to know where it ends. And that's rather easy. Perl also needs to be able to do that before it can parse the operator, so it disallows certain code patterns. (Thus the "almost" above.)

  • Any occurrences of the delimiter must be preceded by an odd number of "\".

  • As an exception to the above, when the delimiter is (), [] or {}, the delimiters may appear unescaped as long as they are balanced.

balanced_paren_guts  : ( /(?:[^\\\(\)]|\\.)+/ | '(' balanced_paren_guts  ')' )(s?)
balanced_square_guts : ( /(?:[^\\\[\]]|\\.)+/ | '[' balanced_square_guts ']' )(s?)
balanced_curly_guts  : ( /(?:[^\\\{\}]|\\.)+/ | '{' balanced_curly_guts  '}' )(s?)


match_op        : <skip:> 'm' /\s*/ match_op_1 match_modifiers

match_op_1      : '(' <commit> balanced_paren  ')'
                | '[' <commit> balanced_square ']'
                | '{' <commit> balanced_curly  '}'
                | /(?x: ([^\\]) (?:(?!\1).|\\.)* \1 )/

match_modifiers : /\w+/


subst_op        : <skip:> 's' /\s*/ subst_op_1 subst_modifiers

subst_op_1      : '(' <commit> balanced_paren  ')' \s* subst_op_2
                | '[' <commit> balanced_square ']' \s* subst_op_2
                | '{' <commit> balanced_curly  '}' \s* subst_op_2
                | /(?x: ([^\\]) (?:(?!\1).|\\.)* \1 (?:(?!\1).|\\.)* \1 )/

subst_op_2      : '(' <commit> balanced_paren  ')'
                | '[' <commit> balanced_square ']'
                | '{' <commit> balanced_curly  '}'
                | /(?x: ([^\\]) (?:(?!\1).|\\.)* \1 )/

subst_modifiers : /\w+/

Notes:

  • The rules may not correctly handle «'» as a delimiter.
  • A rule needs to be added to allow «\» as a delimiter, but I don't think you should support that.


You may want to look at the source code for YAPE::Regex, which is used to parse Perl regular expressions. One big caveat is that it has not been updated since perl version 5.6, which means that it does not understand any regular expression syntax introduced since then (especially 5.10).

See also perlreref

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜