开发者

Python Regex to match a file in a list of files (getting error)

I'm trying to use a regex in Python to match a file (saved as a string, ie "/volumes/footage/foo/bar.mov") to a log file I create that contains a list of files. But when I run the script, it gives me this error: sre_constants.error: unbalanced parenthesis. The code I'm using is this:

To read the file:

theLogFile = The_Root_Path + ".processedlog"
if os.path.isfile(theLogFile):
        the_file = open(theLogFile, "r")
    else:
        open(theLogFile, 'w').close()
        the_file = open(theLogFile, "r")
    the_log = the_file.rea开发者_开发百科d()
    the_file.close()

Then inside a for loop I reassign (I didn't realize I was doing this until I posted this question) the the_file variable as a string from a list of files (obtained by running through a folder and it's subsets and grabbing all the filenames), then try to use regex to see if that filename is present in the log file:

for the_file in filenamelist:
    p = re.compile(the_file, re.IGNORECASE)
    m = p.search(the_log)

Every time it hits the re.compile() part of the code it spits out that error. And if I try to cut that out, and use re.search(the_file, the_log) it still spits out that error. I don't understand how I could be getting unbalanced parenthesis from this.


Where is the regular expression pattern? Are you trying to use filenames contained in one file as patterns to search the other file? If so, you will want to step through the_file with someting like

for the_pattern in the_file:
    p = re.compile(the_pattern, re.IGNORECASE)
    m = p.search(the_log)
    ...

According to the Python re.compile documentation, the first argument to re.compile() should be the regular expression pattern as a string.

But the return value of open() is a file object, which you assign to the_file and pass to re.compile()....


Gordon,

it would seem to me that the issue is in the data. You are compiling uninspected strings from the filelist into regexp, not heeding that they might contain meta characters relevant for the regexp engine.

In your for loop, add a print the_file before the call to re.compile (it is no problem that you are re-using a name as the loop iterator that referred to file object before), so you can see which strings are actually coming from the filelist. Or, better still, run all instances of the_file through re.escape before passing them to re.compile. This will turn all meta characters into their normal equivalent.


What you're binding to name the_file in your first snippet is a file object, even though you say that's "saved as a string", the filename (i.e. the string) is actually named theLogFile but what you're trying t turn into a RE object is not theLogFile (the string), it's the_file (the now-closed file object). Given this, the error's somewhat quirky (one would expect a TypeError), but it's clear that you will get an error at re.compile.


the_file should be a string. In the above code the_file is the return value of open, which is a file object.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜