Extract function's argument from PHP source code
From a shell script, I'd like to be able to extract a function's argument from PHP source code, eg:
->getUrl(<extract-me>, false)
but with the issue that <extract-me>
can be anything that PHP allows, like opening or closing parenthesis, but also more complicated things...
Thank you
Since your question is tagged "perl", I assume you'll accept a Perl solution.
My first thought was to use the module Text::Balanced, in particular: extract_codeblock
(which is actually designed for Perl, but Perl and PHP are similar enough to parse to get acceptable results) but it was not as easy as I had hoped for: extract_codeblock
only properly pulled out the expression if it was bracketed (starts with "{" or "(").
Well, by using that module and writing my own sub combining tokenparsing routines, I got something that superficially appears to work.
use Text::Balanced qw(extract_bracketed extract_quotelike);
sub extract_expression {
local $_ = shift;
my $parsed;
while(1) {
if(s&^(\s*)((?:(?!//|/\*|#)[^[{(\]),;'"'\s])+)&&) {
# normal characters (no delimiters, quotes or brackets, or comments)
$parsed .= "$1$2";
} elsif(/^\s*(?=['"'])/) {
# quotes
(my $token, $_) = extract_quotelike($_, '\'"');
defined $token or last;
$parsed .= $token;
} elsif(/^\s*(?=[\[\{\(])/) {
# brackets
(my $token, $_) = extract_bracketed($_, '[({\'"})]');
defined $token or last;
$parsed .= $token;
} elsif(s&^\s*(?://|\#).*\n?&& || s&^\s*/\*.*?\*/&&s) {
# comments
# ignore
} else {
# not recognized
# finished
last;
}
}
return $parsed, $_;
}
# demo
# complex line of PHP (borrowed from Drupal)
$_ = <<'PHP';
$translations[$lang] = $this->drupalCreateNode(array('type' => $source->type, 'language' => $lang, 'translation_source' => $source, 'status' => $source->status, 'promote' => $source->promote, 'uid' => $source->uid));
# etc
PHP
if(/->drupalCreateNode\(/) {
my $offset = $+[0]; # position right after opening paren
my($expression, $rest) = extract_expression(substr($_, $offset));
if(defined $expression) {
print <<"INFO";
parsed expression: $expression
rest: $rest
INFO
} else {
print "Failure to parse expression\n";
}
}
Well, I have been thinking... comments in PHP are different from comments in Perl, and superficially I have handled PHP comments, but unfortunately the routine isn't recursive and only comments outside any bracketed expression are properly ignored. Internal comments might confuse the parser (extract_bracketed
).
精彩评论