Lex: How do I Prevent it from matching against substrings?
For example, I'm supposed to convert "int" to "INT". But if there's the word "integer", I don't think it's supposed to turn into "INTeger".
If I define "int" printf("INT");
the substrings are matched though. Is there a way to prevent thi开发者_高级运维s from happening?
I believe the following captures what you want.
%{
#include <stdio.h>
%}
ws [\t\n ]
%%
{ws}int{ws} { printf ("%cINT%c", *yytext, yytext[4]); }
. { printf ("%c", *yytext); }
To expand this beyond word boundaries ({ws}
, in this case) you will need to either add modifiers to ws
or add more specifc checks.
well, here's how i did it:
(("int"([a-z]|[A-Z]|[0-9])+)|(([a-z]|[A-Z]|[0-9])+"int")) ECHO;
"int" printf("INT");
better suggestions welcome.
Lex will choose the rule with the longest possible match for the current input. To avoid substring matches you need to include an additional rule that is longer than int
. The easiest way to do to this is to add a simple rule that picks up any string that is longer than one character, i.e. [a-zA-Z]+
. The entire lex program would look like this:-
%%
[\t ]+ /* skip whitespace */
int { printf("INT"); }
[a-zA-Z]+ /* catch-all to avoid substring matches */
%%
int main(int argc, char *argv[])
{
yylex();
}
精彩评论