Converting PCRE recursive regex pattern to .NET balancing groups definition

2023-01-09 08:50 问答作者：

PCRE has a feature called recursive pattern, which can be used to match nested subgroups. For example, consider the "grammar"

Q -> \w | '[' A ';' Q* ','? Q* ']' | '<' A '>'
A -> (Q | ',')*
// to match ^A$.

It can be done in PCRE with the pattern

^((?:,|(\w|\[(?1);(?2)*,?(?2)*\]|<(?1)>))*)$

(Example test case: http://www.ideone.com/L4lHE)

Should match:

abcdefg abc,def,ghi abc,,,def ,,,,,, [abc;] [a,bc;] sss[abc;d] as[abc;d,e] [abc;d,e][fgh;j,k] <abc> [<a>b;<c,d>,<e,f>] <a,b,c> <a,bb,c> <,,,> <> <><> <>,<> a<<<<>>><a>> <<<<<>>>><><<<>>>> <z>[a;b] <z[a;b]> [[;];] [,;,] [;[;]] [<[;]>;<[;][;,<[;,]>]>]

Should not match:

<a bc> <abc<de> [a<b;c>;d,e] [a] <<<<<>>>><><<<>>>>> <<<<<>>>><><<<>>> [abc;def;] [[;],] [;,,] [abc;d,e,f] [<[;]>;<[;][;,<[;,]>]]> <z[a;b>]

There is no recursive pattern in .NET. Instead, it provides balancing groups for stack-based manipulation for matching simple nested patterns.

Is it possible to convert the above PCRE pattern into .NET Regex style?

_{(Yes I know it's better not to use regex in for this. It's just a theoretical question.)}

References

pcre.org - PCRE man page - Recursive Patterns
MSDN - Regu开发者_开发百科lar Expression Language Elements - Balancing Group Definitions

The .Net alternative to recursive pattern is a stack. The challenge here is that we need to express the grammar it terms of stacks.
Here's one way of doing that:

A slightly different notation for grammars

First, we need grammar rules (like A and Q in the question).
We have one stack. The stack can only contain rules.
At each step we pop the current state from the stack, match what we need to match, and push further rules into the stack. When we're done with a state we don't push anything and get back to the previous state.

This notation is somewhere between CFG and Pushdown automaton, where we push rules to the stack.

Example:

Let's start with a simple example: aⁿbⁿ. The usual grammar for this language is:

S -> aSb | ε

We can rephrase that to fit the notation:

# Start with <push S>
<pop S> -> "a" <push B> <push S> | ε
<pop B> -> "b"

In words:

We start with S in the stack.
When we pop S from the stack we can either:
- Match nothing, or...
- match "a", but then we have to push the state B to the stack. This is a promise we will match "b". Next we also push S so we can keep matching "a"s if we want to.
When we've matched enough "a"s we start popping Bs from the stack, and match a "b" for each one.
When this is done, we've matched an even number of "a"'s and "b"s.

or, more loosely:

When we're in case S, match "a" and push B and then S, or match nothing.
When we're in case B, match "b".

In all cases, we pop the current state from the stack.

Writing the pattern in a .Net regular expression

We need to represent the different states somehow. We can't pick '1' '2' '3' or 'a' 'b' 'c', because those may not be available in our input string - we can only match what is present in the string.
One option is to number our states (In the example above, S would state number 0, and B is state 1).
For state S

继续阅读：.net balancing-groups pcre recursive-regex regex

Converting PCRE recursive regex pattern to .NET balancing groups definition

Should match:

Should not match:

References

A slightly different notation for grammars

Example:

Writing the pattern in a .Net regular expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Should match:

Should not match:

References

A slightly different notation for grammars

Example:

Writing the pattern in a .Net regular expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？