开发者

Can recursive regexps exist?

Im trying to get a javascript regex that matches x opening braces, then x closing braces, while allowing them to be nested in-between each other.

For example, it would match: "{ a { q } }" but not "{ a { q } { 开发者_StackOverflow社区}" or "{ } } { } {"

That being said, I have no idea how to do it with regexpes, or if it's even possible.


The short answer to this is no. Regular expressions are a non-context-free grammar, so it cannot be done with true regex. You can, however, look for specific (non-arbitrary) nesting patterns.

http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx

The recursion problem here is, at its heart, the same reason you can't correctly parse HTML with regex. Like XML, the construct you've described is a context-free grammar; note its close similarity with the first example from the Wikipedia article.

I've heard there are engines out there that extend regex to offer support for arbitrarily nested elements, but this would make them something other than true regex. Anyway, I don't know of any such libraries for JavaScript. I think what you want is some kind of string-manipulation-based parser.


AFAIK, uou can’t really do this with regular expressions only.

However, Javascript’s String.replace method does have a nice feature that could allow you some level of recursion. If you pass a function as the second parameter, that function will be called for each match encountered. You could then perform the same replace on that match, passing along the same function, which would be called for each match inside that match, etc.

I’m too tired right now to write up an example that fits what you’re asking for — or even if it’s actually possible, so I’ll leave it at this possible hint, and further working out as an exercise to the reader.


That is not possible to do with real regular expression, and even with full-blown PCRE the "counting problem" that you're describing is an example of something that you just can't do.

An old textbook I had in school said, "regular expressions can't count." That's not true of modern "supercharged" regular expression implementations with the "{n,m}" qualifiers, but note that the values in curly braces there are constants.

To do that, you need a more complicated automaton. Context-free grammars can represent languages like you describe, as can parse expression grammars.


Yes, it's probably possible with Regexes. No, it isn't possible in Javascript Regexes. Yes, it's probably possible in .NET Regexes for example (Balancing Groups http://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.71).aspx ). No, I don't know how to do them. They give me migraine (and I'm not kidding here). They are quite extreme voodoo.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜