Writing re-entrant lexer with Flex
I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue):
reentrant.l:
/* Definitions */
digit [0-9]
letter [a-zA-Z]
alphanum [a-zA-Z0-9]
identifier [a-zA-Z_][a-zA-Z0-9_]+
integer [0-9]+
natural [0-9]*[1-9][0-9]*
decimal ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+)
%{
#include <stdio.h>
#define ECHO fwrite(yytext, yyleng, 1, yyout)
int totalNums = 0;
%}
%option reentrant
%option prefix="simpleit_"
%%
^(.*)\r?\n printf("%d\t%s", yylineno++, yytext);
%%
/* Routines */
int yywrap(yyscan_t yyscanner)
{
return 1;
}
int main(int argc, char* argv[])
{
yyscan_t yyscanner;
if(argc < 2) {
printf("Usage: %s fileName\n", argv开发者_JAVA技巧[0]);
return -1;
}
yyin = fopen(argv[1], "rb");
yylex(yyscanner);
return 0;
}
Compilation errors:
vietlq@mylappie:~/Desktop/parsers/reentrant$ gcc lex.simpleit_.c
reentrant.l: In function ‘main’:
reentrant.l:44: error: ‘yyg’ undeclared (first use in this function)
reentrant.l:44: error: (Each undeclared identifier is reported only once
reentrant.l:44: error: for each function it appears in.)
For a reentrant lexer, all communication must include the state, which is contained within the scanner.
Anywhere in your program (e.g. inside main
) you can access the state variables via special functions to which you will pass your scanner. E.g., in your original reentrant.l
, you can do this:
yyscan_t scanner;
yylex_init(&scanner);
yyset_in(fopen(argv[1], "rb"), scanner);
yylex(scanner);
yylex_destroy(scanner);
I have renamed scanner
to avoid confusion with yyscanner
in the actions. In contrast with general C code, all your actions occur within a giant function called yylex
, which is passed your scanner by the name yyscanner
. Thus, yyscanner
is available to all your actions. In addition, yylex
has a local variable called yyg
that holds the entire state, and most macros conveniently refer to yyg
.
While it is true that you can use the yyin
macro inside main
by defining yyg
as you did in your own Answer, that is not recommended. For a reentrant lexer, the macros are meant for actions only.
To see how this is implemented, you can always view the generated code:
/* For convenience, these vars
are macros in the reentrant scanner. */
#define yyin yyg->yyin_r
...
/* Holds the entire state of the reentrant scanner. */
struct yyguts_t
...
#define YY_DECL int yylex (yyscan_t yyscanner)
/** The main scanner function which does all the work.
*/
YY_DECL
{
struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
...
}
There is lots more on the reentrant
option in the flex docs, which include a cleanly compiling example. (Google "flex reentrant", and look for the flex.sourceforge
link.) Unlike bison, flex has a fairly straight-forward model for reentrancy. I strongly suggest using reentrant flex with Lemon Parser, rather than with yacc/bison.
精彩评论