开发者

Small problem with reg exps in python

So I have one variable that has all the code from some file. I need to remove all comments from this file. One of my regexp lines is this

x=re.sub('\/\*.*\*\/','',x,re.M,re.S);

What I want this to be doing is to remove all multi line comments. For so开发者_StackOverflow中文版me odd reason though, its skipping two instances of */, and removing everything up to the third instance of */.

I'm pretty sure the reason is this third instance of */ has code after it, while the first two are by themselves on the line. I'm not sure why this matters, but I'm pretty sure thats why.

Any ideas?


.* will always match as many characters as possible. Try (.*?) - most implementations should try to match as few characters as possible then (should work without the brackets but not sure right now). So your whole pattern should look like this: \/\*.*?\*\/ or \/\*(.*?)\*\/


The expression .* is greedy, meaning that it will attempt to match as many characters as possible. Instead, use (.*?) which will stop matching characters as soon as possible.


The regular expression is "greedy" and when presented with several stopping points will take the farthest one. Regex has some patterns to help control this, in particular the

(?&gt!...)

which matches the following expression only if it is Not preceeded by a match of the pattern in parens. (put in a pointy brace for &gt in the above - I don't know the forum convention for getting on in my answer).

(?*...) was not in Python 2.4 but is a good choice if you are using a later version.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜