generalizing the pumping lemma for UNIX-style regular expressions

2022-12-27 04:39 问答作者：

Most UNIX regular expressions have, besides the usual **,+,?* oper开发者_Python百科ators a backslash operator where \1,\2,... match whatever's in the last parentheses, so for example *L=(a*)b\1* matches the (non regular) language *a^n b a^n*.

On one hand, this seems to be pretty powerful since you can create (a*)b\1b\1 to match the language *a^n b a^n b a^n* which can't even be recognized by a stack automaton. On the other hand, I'm pretty sure *a^n b^n* cannot be expressed this way.

I have two questions:

Is there any literature on this family of languages (UNIX-y regular). In particular, is there a version of the pumping lemma for these?
Can someone prove, or disprove, that *a^n b^n* cannot be expressed this way?

You're probably looking for

Benjamin Carle and Paliath Narendran "On Extended Regular Expressions" LNCS 5457
- DOI:10.1007/978-3-642-00982-2_24
- PDF Extended Abstract at http://hal.archives-ouvertes.fr/docs/00/17/60/43/PDF/notes_on_extended_regexp.pdf
C. Campeanu, K. Salomaa, S. Yu: A formal study of practical regular expressions, International Journal of Foundations of Computer Science, Vol. 14 (2003) 1007 - 1018.
- DOI:10.1142/S012905410300214X

and of course follow their citations forward and backward to find more literature on this subject.

a^n b^n is CFL. The grammar is

A -> aAb | e

you can use pumping lemma for RL to prove A is not RL

Ruby 1.9.1 supports the following regex:

regex = %r{ (?<foo> a\g<foo>a | b\g<foo>b | c) }x

p regex.match("aaacbbb")
# the result is #<MatchData "c" foo:"c">

"Fun with Ruby 1.9 Regular Expressions" has an example where he actually arranges all the parts of a regex so that it looks like a context-free grammar as follows:

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

I think this means that at least Ruby 1.9.1's regex engine, which is the Oniguruma regex engine, is actually equivalent to a context-free grammar, though the capturing groups aren't as useful as an actual parser-generator.

This means that "Pumping lemma for context-free languages" should describe the class of languages recognizable by Ruby 1.9.1's regex engine.

EDIT: Whoops! I messed up, and didn't do an important test which actually makes my answer above totally wrong. I won't delete the answer, because it's useful information nonetheless.

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x
#I added anchors for the beginning and end of the string
regex.match("aaacbbb")
#returns nil, indicating that no match is possible with recursive capturing groups.

EDIT: Coming back to this many months later, I just discovered that my test in the last edit was incorrect. "aaacbbb" shouldn't be expected to match regex even if regex does operate like a context-free grammar.

The correct test should be on a string like "aabcbaa", and that does match the regex:

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x
regex.match("aaacaaa")
# => #<MatchData "aaacaaa" foo:"aaacaaa">
regex.match("aacaa")
# => #<MatchData "aacaa" foo:"aacaa">
regex.match("aabcbaa")
# => #<MatchData "aabcbaa" foo:"aabcbaa">

继续阅读：pumping-lemma regular-language

generalizing the pumping lemma for UNIX-style regular expressions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？