开发者

regular expressions in python

I find regular expressions pretty tough to understand in python. The documentation is too cryptic. For instance what would开发者_如何转开发 be the re for removing all instances of #if DEBUG and everything enclosed between it and its corresponding #endif in a C file. The following is not working:

 buf = file.read()
 a = re.compile("#if.DEBUG?#endif", re.MULTILINE + re.DOTALL)  
 string1 = re.sub(p_macro, '', buf) 


If you want to remove all instances of #if DEBUG all you have to do is define DEBUG to 0, and run the preprocessor on it. No need for nasty regular expressions.

Also, it's generally not a good idea to operate on a context free grammar (C source, for example, or more notoriously, html) using regular expressions. Use a parsing library. Check out the eclipse sdk for example: http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.isv/reference/api/overview-summary.html


Python's RegEx uses most of the syntax from PCRE. You could learn some of them from http://www.regular-expressions.info/tutorial.html.

Your code does not work because

  #if.DEBUG?#endif
//        ^^

the G? actually means "one or zero G character".

If you want to remove the whole #if DEBUG block, try

re.compile(
    r'^\s*#if\s+DEBUG'    # match the '#if DEBUG' preprocessor.
    r'.*?'                # match all content in between until...
    r'^\s*#endif'         # ... getting a '#endif' and match it
,
    re.S|re.M
)

but it will not work with nested #if blocks, and it won't check if the preprocessor is within a comment /* ... */. It's better to use a CPP parser for correctness.


If Kodos, the Python Regular Expression Debugger, is available on your development platform, you'll have an easier time crafting and testing regular expressions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜