Can actions in Lex access individual regex groups?
Can actions in Lex access individual regex groups?
(NOTE: I'm guessing not, since the group characters - parentheses - are according to the documentation used to change precedence. But if so, do you recommend an alternative C/C++ scanner generator that can do this? I'm not really hot on writing my own lexical analyzer.)
Example:
Let's say I have this input: foo [tagName attribute="value"] bar
and I want to extract the tag using Lex/Flex. I could certainly write this rule:
\[[a-z]+[[:space:]]+[a-z]+=\"[a-z]+\"\] printf("matched %s", yytext);
But let's say I would want to access certain parts of the string, e.g. the attribute but without having to parse yytext again (as the string has already been scanned it doesn't really make sense to scan part of it again). So something like this would be preferable (regex groups):开发者_Go百科
\[[a-z]+[[:space:]]+[a-z]+=\"([a-z]+)\"\] printf("matched attribute %s", $1);
You can separate it to start conditions. Something like this:
%x VALUEPARSE ENDSTATE
%%
char string_buf[100];
<INITIAL>\[[a-z]+[[:space:]]+[a-z]+=\" {BEGIN(VALUEPARSE);}
<VALUEPARSE>([a-z]+) (strncpy(string_buf, yytext, yyleng);BEGIN(ENDSTATE);} //getting value text
<ENDSTATE>\"\] {BEGIN(INITIAL);}
%%
About an alternative C/C++ scanner generator - I use QT class QRegularExpression for same things, it can very easy get regex group after match.
Certainly at least some forms of them do. But the default lex/flex downloadable from sourceforge.org do not seem to list it in their documentation, and this example leaves the full string in yytext.
From IBM's LEX documentation for AIX:
(Expression) Matches the expression in the parentheses. The () (parentheses) operator is used for grouping and causes the expression within parentheses to be read into the yytext array. A group in parentheses can be used in place of any single character in any other pattern.
Example: (ab|cd+)?(ef)* matches such strings as abefef, efefef, cdef, or cddd; but not abc, abcd, or abcdef.
精彩评论