Traceback for regular expression

2023-04-09 15:59 问答作者：

Lets say i have a regular expression:

match = re.search(pattern, content)
if not match:
    raise Exception, 'regex traceback' # i want to throw here the regex matching process.

If regular expression fails to match then i want to throw in exception Its working and where it fails to match the regular expression pattern, at what stage etc. Is it possible even to开发者_StackOverflow社区 achieve the desired functionality?

I have something that helps me to debug complex regex patterns among my codes.
Does this help you ? :

import re

li = ('ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n',

      '2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798',


      'ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n',

      '25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798')


tupleRE = ('^\d',
           ' ',
           '\d{5}',
           ' ',
           '[abcdefghi]+',
           ' ',
           '(?=[a-z\d_ ]{14} [^ ]+\t\t ght)',
           '[a-z]+',
           '__',
           '[\d]+',
           ' +',
           '[^\t]+',
           '\t\t',
           ' ',
           'ght',
           '(r[5-9]+|u[0-4]+)',
           '$')  



def REtest(ch, tuplRE, flags = re.MULTILINE):
    for n in xrange(len(tupleRE)):
        regx = re.compile(''.join(tupleRE[:n+1]), flags)
        testmatch = regx.search(ch)
        if not testmatch:
            print '\n  -*- tupleRE :\n'
            print '\n'.join(str(i).zfill(2)+' '+repr(u)
                            for i,u in enumerate(tupleRE[:n]))
            print '   --------------------------------'
            # tupleRE doesn't works because of element n
            print str(n).zfill(2)+' '+repr(tupleRE[n])\
                  +"   doesn't match anymore from this ligne "\
                  +str(n)+' of tupleRE'
            print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u)
                            for j,u in enumerate(tupleRE[n+1:
                                                         min(n+2,len(tupleRE))]))

            for i in xrange(n):
                match = re.search(''.join(tupleRE[:n-i]),ch, flags)
                if match:
                    break

            matching_portion = match.group()
            matching_li = '\n'.join(map(repr,
                                        matching_portion.splitlines(True)[-5:]))
            fin_matching_portion = match.end()
            print ('\n\n  -*- Part of the tested string which is concerned :\n\n'
                   '######### matching_portion ########\n'+matching_li + '\n'
                   '##### end of matching_portion #####\n'
                   '-----------------------------------\n'
                   '######## unmatching_portion #######')
            print '\n'.join(map(repr,
                                ch[fin_matching_portion:
                                   fin_matching_portion+300].splitlines(True)) )
            break
    else:
        print '\n  SUCCES . The regex integrally matches.'



for x in li:
    print '  -*- Analyzed string :\n%r' % x
    REtest(x,tupleRE)
    print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'

result

  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\nqfgqrgqrg'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
   --------------------------------
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'   doesn't match anymore from this ligne 6 of tupleRE
07 '[a-z]+'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'5 12478 abdefgcd '
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
07 '[a-z]+'
08 '__'
09 '[\\d]+'
10 ' +'
11 '[^\t]+'
12 '\t\t'
13 ' '
14 'ght'
15 '(r[5-9]+|u[0-4]+)'
   --------------------------------
16 '$'   doesn't match anymore from this ligne 16 of tupleRE



  -*- Part of the tested string which is concerned :

######### matching_portion ########
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'940\n'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  -*- tupleRE :

00 '^\\d'
   --------------------------------
01 ' '   doesn't match anymore from this ligne 1 of tupleRE
02 '\\d{5}'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'2'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'5 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
   --------------------------------
05 ' '   doesn't match anymore from this ligne 5 of tupleRE
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'9 54879 bbde'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'Yddf antarctic__13  18:13pomodoro\t\t ghtr6798'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm

I've used Kodos (http://kodos.sourceforge.net/about.html) in the past to perform RegEx debugging. It's not the ideal solution since you want something for run-time, but it may be helpful to you.

if you need to test the re, you can probably use groups followed by * ... as in ( sometext)* use this along w/ your desired regex, and then you should be able to pluck out your failure locations

and then leverage the following, as stated on python.org

pos The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.

endpos The value of endpos which was passed to the search() or match() method of the > RegexObject. This is the index into the string beyond which the RE engine will not go.

lastindex The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applied to the same string.

lastgroup The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.

re The regular expression object whose match() or search() method produced this MatchObject instance.

string The string passed to match() or search().

so for a very simple example

>>> m1 = re.compile(r'the real thing')
>>> m2 = re.compile(r'(the)* (real)* (thing)*')
>>> if not m1.search(mytextvar):
>>>     res = m2.search(mytextvar)
>>>     print res.lastgroup
>>>     #raise my exception

继续阅读：python regex

Traceback for regular expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？