Python 3 regular expression to find multiline comment
I'm trying to find comment blocks in PHP source code using regular expressions in Python 3. The PHP comments are in this format:
/**
* This is a very short block comment
*/
Now I came up with the following regular expression:
'/\*\*[.]+?\*/'
I figure that -in combination with the DOTALL flag- should do it, but no. It doesn't find anything. Strange thing is that when I remove the trailing slash, like this:
'/\*\*[.]+?\*'
then it finds the following string:
/**\n\t*
I have no idea why the regex can't find an asterisk followed by a slash... I checked the file that I'm searching to double check I didn't have a typo in the comment (I didn't). Also a slash is no special character in regex, so I wouldn't have to escape it. (I tried, but it didn't help.)
Can anyone tell me what's wrong with my regex? :)
By the way, I also came across this! thread where someone tried to do the same in Java. The final winning answer finished his regular expression the same way I do now, so I'm clueless :( Could this be a bug in Python regex or am I completely missing something?
Any help is much ap开发者_Python百科preciated! :D
You can use the re.DOTALL
flag to make the .
character match newlines:
re.compile(r'/\*\*.+?\*/', re.DOTALL)
(As a side note, PHP block comments can start with /*
, not just /**
.)
Try this:
r'\/\*\*[^*]*\*+([^/][^*]*\*+)*\/'
(this is the regex used by some CSS parsers for /* CSS comments */
, so I believe it is pretty solid)
It won't match the exact format including line breaks and the inner asterisks, but you can work around it. This will match:
/**
* This is a very short block comment
*/
But also:
/** This is a very short block comment */
And even:
/** This is a very short block comment
*/
To match the exact format of docblocks, you'd need a real parser, not regular expressions.
精彩评论