开发者

Flex / Lex Encoding Strings with Escaped Characters

I'll refer to this question for some of the background:

Regular expression for a string literal in flex/lex

The problem I am having is handling the input with escaped characters in my lexer and I think it may be an issue to do with the encoding of the string, but I'm not sure.

Here's is how I am handling string literals in my lexer:

\"(\\.|[^\\"])*\"
{                   
    char* text1 = strndup(yytext + 1, strlen(yytext) - 2);
    char* text2 = "text\n";

    printf("value = <%s> <%x>\n", text1, text1);
    printf("value = <%s> <%x>\n", text2, text2);
}

This out开发者_运维技巧puts the following:

value = <text\n"> <15a1bb0>
value = <text
> <7ac871>

It appears to be treating the newline character separately as a backslash followed by an n.

What's going on here, how do I process the text to be identical to the C input?


Your regexp just matches string \ escapes -- it doesn't actually translate them into the characters that they represent. I prefer to handle this sort of thing with a flex start state and string building buffer that can accumulate characters. Something like:

%{
static StringBuffer strbuf;
%}
%x string
%%

\"                  { BEGIN string; ClearBuffer(strbuf); }
<string>[^\\"\n]*   { AppendBufferString(strbuf, yytext); }
<string>\\n         { AppendBufferChar(strbuf, '\n'); }
<string>\\t         { AppendBufferChar(strbuf, '\t'); }
<string>\\[0-7]*    { AppendBufferChar(strbuf, strtol(yytext+1, 0, 8)); }
<string>\\[\\"]     { AppendBufferChar(strbuf, yytext[1]); }
<string>\"          { yylval.str = strdup(BufferData(strbuf)); BEGIN 0; return STRING; }
<string>\\.         { error("bogus escape '%s' in string\n", yytext); }
<string>\n          { error("newline in string\n"); }

This makes what is going on much clearer, makes it easy to add new escape processing code for new escapes, and makes it easy to issue clear error messages when something goes wrong.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜