How to parse a function and its arguments
For my lexer I'm using the boost::wave
lexical iterator which gives me all the tokens from a .cpp
, .h
.hpp
etc. file.
Now I want to find if a set of tokens i.e. an identifier followed by open parenthesis
and then set of arguments separated by comma
and finally closed parenthesis
, is a function in a C++ program. I mean how should I analyze the set of tokens to make sure I have a function?
I am trying to implement this using a recursive descent parser. Till now my recursive descent parser can parse arithmetic expressions and take care of almost all kinds of operator precedence.
Or is there a function (in boost::wave
) which can directly parse a function for me?
Also it would be helpful if somebody can suggest how I can find the type
variable in the function argument. e.g. if I have a function:
int myfun(char* c, T& t1) { //... }
then how can I get tokens of char
and *
which can be treated as type of c
.
Similarly tokens of T
and &
which can be treated as type of t1
?
EDIT: Here is a little more explanation to my question
references:
the boost wave documentation
http://www.boost.org/doc/libs/1_47_0/libs/wave/index.html
list of token identifiers
http://www.boost.org/doc/libs/1_47_0/libs/wave/doc/token_ids.html
typedef boost::wave::cpplexer::lex_token<> token_type;
typedef boost::wave::cpplexer::lex_iterator<token_type> token_iterator;
typedef token_type::position_type position_type;
position_type pos(filename);
//instr is the input file stream
token_iterator it = token_iterator(instr.begin(), instr.end(), pos,
boost::wave::language_support(
boost::wave::support_cpp|boost::wave::support_option_long_long));
token_iterator end = token_iterator();
//while it != end
//...
boost::wave::token_id id = boost::wave::token_id(*it);
switch(id){
//...
case boost::wave::T_IDENTIFIER:
Match(id);//consumes one token and increments the token_iterator
//get the token id of the next token
id = boost::wave开发者_开发百科::token_id(*it);
//if an identifier is immediately followed by T_LEFTPAREN then it will be a function
if(id == boost::wave::T_LEFTPAREN) {
Match(id); (1)
//this function i want to implement
ParseFunction(); (2)
Match(boost::wave::T_RIGHTPAREN);
}
//...
}
So the question is how to implement the function ParseFunction()
If your system is POSIX-compliant (Linux, MacOSX, Solaris, ...) you can use dlopen
/dlsym
to determine whether the symbol exists. You need to watch out for name mangling, and on some systems you need to beware that [for example] the real name of sin
is _sin
.
Whether dlsym
returns a pointer to a function or a pointer to some global variable — dlsym is clueless. In fact, you will have to do something that is very much contrary to both the C and C++ standards to use dlsym
: you will have to cast the void*
pointer returned by dlsym
to a function pointer. The POSIX standard is in conflict with C/C++. That said, if you are on a POSIX-compliant system, those void*
pointers will convert to a function pointer (otherwise the system is not POSIX-compliant).
Edit:
A huge gotcha: How do you call the thing you just found? How to you know how to handle the returned value, if there is any?
A simple example: suppose your input file contains xsq = pow (x, 2)
. You have to know ahead of time that the signature of pow
is double pow (double, double)
.
Rather than using dlsym
you are much better off handling a limited set of functions that you expressly build into your parser.
精彩评论