开发者

How can a recursive regexp be implemented in python?

I'm interested how can be implemented recursive regexp matching in Python (I've not found any examples :( ). For example how would one write expression which matches "bracket balanced" st开发者_开发百科ring like "foo(bar(bar(foo)))(foo1)bar1"


You could use pyparsing

#!/usr/bin/env python
from pyparsing import nestedExpr
import sys
astring=sys.argv[1]
if not astring.startswith('('):
    astring='('+astring+')'

expr = nestedExpr('(', ')')
result=expr.parseString(astring).asList()[0]
print(result)

Running it yields:

% test.py "foo(bar(bar(foo)))(foo1)bar1"
['foo', ['bar', ['bar', ['foo']]], ['foo1'], 'bar1']


This is an old question, but for the people who come here through searches:

There's an alternative regex module for python that does support recursive patterns: https://pypi.python.org/pypi/regex

And it has a lot of more nice improvements on re.


You can't do it with a regexp. Python doesn't support recursive regexp


Unfortunately I don't think Python's regexps support recursive patterns.

You can probably parse it with something like pyparsing: https://github.com/pyparsing/pyparsing


With PyPi regex, that you can easily install using pip install regex, you may use

import regex

pattern = r'[^()]*+(\((?>[^()]|(?1))*+\)[^()]*+)++'
text = 'foo(bar(bar(foo)))(foo1)bar1'
print(bool(regex.fullmatch(pattern, text)))
# => True

See the Python demo, see the regex pattern demo (note the \A and \z anchors are added in the demo because regex.fullmatch requires a full string match).

Pattern details

  • \A - implicit in regex.fullmatch - start of string
  • [^()]*+ - 0 or more chars other than ( and ) (possessive match, no backtracking into the pattern allowed)
  • (\((?>[^()]|(?1))*+\)[^()]*+)++ - 1+ occurrences of Group 1 pattern:
    • \( - a ( char
    • (?>[^()]|(?1))*+ - 1+ repetitions (possessive match) of
      • [^()] - any char but ( and )
      • | - or
      • (?1) - a regex subroutine that recurses Group 1 pattern
    • \) - a ) char
    • [^()]*+ - 0 or more chars other than ( and ) (possessive match)
  • \z - implicit in regex.fullmatch - end of string.

See the pattern and more information on regex subroutines at regular-expressions.info.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜