Small problem with reg exps in python
So I have one variable that has all the code from some file. I need to remove all comments from this file. One of my regexp lines is this
x=re.sub('\/\*.*\*\/','',x,re.M,re.S);
What I want this to be doing is to remove all multi line comments. For so开发者_StackOverflow中文版me odd reason though, its skipping two instances of */, and removing everything up to the third instance of */.
I'm pretty sure the reason is this third instance of */ has code after it, while the first two are by themselves on the line. I'm not sure why this matters, but I'm pretty sure thats why.
Any ideas?
.*
will always match as many characters as possible. Try (.*?)
- most implementations should try to match as few characters as possible then (should work without the brackets but not sure right now). So your whole pattern should look like this: \/\*.*?\*\/
or \/\*(.*?)\*\/
The expression .*
is greedy, meaning that it will attempt to match as many characters as possible. Instead, use (.*?)
which will stop matching characters as soon as possible.
The regular expression is "greedy" and when presented with several stopping points will take the farthest one. Regex has some patterns to help control this, in particular the
(?>!...)
which matches the following expression only if it is Not preceeded by a match of the pattern in parens. (put in a pointy brace for > in the above - I don't know the forum convention for getting on in my answer).
(?*...) was not in Python 2.4 but is a good choice if you are using a later version.
精彩评论