Regex and PHP to get Java/PHP class content from a source file
I need to parse some text file searching for php classes. So, for example, if I have a text file with this source:
... some text ...
... some other text ...
class Foo{
function Bar($param){ ... do stuff ... }
}
... some other text ...
class Bar{
function Foo(){ ... do something .... 开发者_开发问答 }
}
... some else ...
In this case, my regular expression must match the two classes and the content of the classes, to get back this results:
first result:
class Foo{
function Bar($param){ ... do stuff ... }
}
second result:
class Bar{
function Foo(){ ... do something .... }
}
I've tried a lot of times but unlucky. My last test was
/^[\n\r\t ](?:abstract|class|interface){1}(.)[^(?:class|interface)]*$/im
but it only matches
class Foo{
and
class Bar{
without the content of the class.
Thanks for your help :)
This cannot be done with "classic" regular expressions because you'd need to be able to handle arbitrarily nested parentheses, and structures like these are by definition irregular. Some programming languages (.NET, PCRE, Perl 5.6 and up) have augmented regular expressions to support recursive matching, but most implementations can't handle recursion yet.
I'd also wager a bet that even if your favorite language's regex engine can handle recursion, it's usually not the best way to go. Most of the time, you rather want a parser for this.
That said, even without recursive regexes you might have a chance if your code is formatted in a consistent manner (start column of the class definition == column of the closing }
, no mix of tabs and spaces, and every sub-level structure is indented).
Then you could try
/^([\t ]*)(?:abstract|class|interface).*?^\1\}/sim
But this is sure to fail horribly if your code is not exactly formatted according to those rules.
Explanation:
^ # start of line
([\t\ ]*) # match and remember whitespace
(?:abstract|class|interface) # match keyword
.*? # match as few characters as possible
^\1 # until the next line that starts with the same amount of whitespace
\} # followed by a }
精彩评论