开发者

Traceback for regular expression

Lets say i have a regular expression:

match = re.search(pattern, content)
if not match:
    raise Exception, 'regex traceback' # i want to throw here the regex matching process.

If regular expression fails to match then i want to throw in exception Its working and where it fails to match the regular expression pattern, at what stage etc. Is it possible even to开发者_StackOverflow社区 achieve the desired functionality?


I have something that helps me to debug complex regex patterns among my codes.
Does this help you ? :

import re

li = ('ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n',

      '2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798',


      'ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n',

      '25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798')


tupleRE = ('^\d',
           ' ',
           '\d{5}',
           ' ',
           '[abcdefghi]+',
           ' ',
           '(?=[a-z\d_ ]{14} [^ ]+\t\t ght)',
           '[a-z]+',
           '__',
           '[\d]+',
           ' +',
           '[^\t]+',
           '\t\t',
           ' ',
           'ght',
           '(r[5-9]+|u[0-4]+)',
           '$')  



def REtest(ch, tuplRE, flags = re.MULTILINE):
    for n in xrange(len(tupleRE)):
        regx = re.compile(''.join(tupleRE[:n+1]), flags)
        testmatch = regx.search(ch)
        if not testmatch:
            print '\n  -*- tupleRE :\n'
            print '\n'.join(str(i).zfill(2)+' '+repr(u)
                            for i,u in enumerate(tupleRE[:n]))
            print '   --------------------------------'
            # tupleRE doesn't works because of element n
            print str(n).zfill(2)+' '+repr(tupleRE[n])\
                  +"   doesn't match anymore from this ligne "\
                  +str(n)+' of tupleRE'
            print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u)
                            for j,u in enumerate(tupleRE[n+1:
                                                         min(n+2,len(tupleRE))]))

            for i in xrange(n):
                match = re.search(''.join(tupleRE[:n-i]),ch, flags)
                if match:
                    break

            matching_portion = match.group()
            matching_li = '\n'.join(map(repr,
                                        matching_portion.splitlines(True)[-5:]))
            fin_matching_portion = match.end()
            print ('\n\n  -*- Part of the tested string which is concerned :\n\n'
                   '######### matching_portion ########\n'+matching_li + '\n'
                   '##### end of matching_portion #####\n'
                   '-----------------------------------\n'
                   '######## unmatching_portion #######')
            print '\n'.join(map(repr,
                                ch[fin_matching_portion:
                                   fin_matching_portion+300].splitlines(True)) )
            break
    else:
        print '\n  SUCCES . The regex integrally matches.'



for x in li:
    print '  -*- Analyzed string :\n%r' % x
    REtest(x,tupleRE)
    print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'

result

  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\nqfgqrgqrg'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
   --------------------------------
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'   doesn't match anymore from this ligne 6 of tupleRE
07 '[a-z]+'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'5 12478 abdefgcd '
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
07 '[a-z]+'
08 '__'
09 '[\\d]+'
10 ' +'
11 '[^\t]+'
12 '\t\t'
13 ' '
14 'ght'
15 '(r[5-9]+|u[0-4]+)'
   --------------------------------
16 '$'   doesn't match anymore from this ligne 16 of tupleRE



  -*- Part of the tested string which is concerned :

######### matching_portion ########
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'940\n'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  -*- tupleRE :

00 '^\\d'
   --------------------------------
01 ' '   doesn't match anymore from this ligne 1 of tupleRE
02 '\\d{5}'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'2'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'5 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
   --------------------------------
05 ' '   doesn't match anymore from this ligne 5 of tupleRE
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'9 54879 bbde'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'Yddf antarctic__13  18:13pomodoro\t\t ghtr6798'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm


I've used Kodos (http://kodos.sourceforge.net/about.html) in the past to perform RegEx debugging. It's not the ideal solution since you want something for run-time, but it may be helpful to you.


if you need to test the re, you can probably use groups followed by * ... as in ( sometext)* use this along w/ your desired regex, and then you should be able to pluck out your failure locations

and then leverage the following, as stated on python.org

pos The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.

endpos The value of endpos which was passed to the search() or match() method of the > RegexObject. This is the index into the string beyond which the RE engine will not go.

lastindex The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applied to the same string.

lastgroup The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.

re The regular expression object whose match() or search() method produced this MatchObject instance.

string The string passed to match() or search().

so for a very simple example

>>> m1 = re.compile(r'the real thing')
>>> m2 = re.compile(r'(the)* (real)* (thing)*')
>>> if not m1.search(mytextvar):
>>>     res = m2.search(mytextvar)
>>>     print res.lastgroup
>>>     #raise my exception
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜