Is there a jflex specification of java string literals somewhere?
And by string literals I mean those containing \123
-like characters too.
I've written something but I don't know if it's perfect:
<STRING> {
\" { yybegin(YYINITIAL);
return new To开发者_Python百科ken(TokenType.STRING,string.toString()); }
\\[0-3][0-7][0-7] { string.append( yytext() ); }
\\[0-3][0-7] { string.append( yytext() ); }
\\[0-7] { string.append( yytext() ); }
[^\n\r\"\\]+ { string.append( yytext() ); }
\\t { string.append('\t'); }
\\n { string.append('\n'); }
\\r { string.append('\r'); }
\\\" { string.append('\"'); }
\\ { string.append('\\'); }
}
In fact, I know this isn't perfect, since for the three lines parsing \ddd
-like characters, I don't put the character itself in the string, but its representation instead.
I may try to convert it using Character methods, but then maybe I'm not exhaustive, maybe there are other escape sequences I didn't handle.... so if there is a canonical jflex file for that it would be perfect.
When looking at the JLS, paragraph 3.10.5 String Literals, it defines String literals as follows:
StringLiteral: " StringCharacters* " StringCharacters: StringCharacter StringCharacters StringCharacter StringCharacter: InputCharacter but not " or \ EscapeSequence
where an EscapeSequence
is defined in 3.10.6:
EscapeSequence: \ b /* \u0008: backspace BS */ \ t /* \u0009: horizontal tab HT */ \ n /* \u000a: linefeed LF */ \ f /* \u000c: form feed FF */ \ r /* \u000d: carriage return CR */ \ " /* \u0022: double quote " */ \ ' /* \u0027: single quote ' */ \ \ /* \u005c: backslash \ */ OctalEscape /* \u0000 to \u00ff: from octal value */ OctalEscape: \ OctalDigit \ OctalDigit OctalDigit \ ZeroToThree OctalDigit OctalDigit OctalDigit: one of 0 1 2 3 4 5 6 7 ZeroToThree: one of 0 1 2 3
Note that \'
is also a valid escape sequence in a String literal and at the moment, you still miss a couple of escape sequences. You may also want to account for Unicode escapes that can be present in Java source files (and thus in String literals as well): \u HEX HEX HEX HEX
where HEX
is one of 0-9 | A-F
.
Yes. Download JFlex an see the files examples/java/java.flex
. It has the definitions in JFlex syntax for all of the lexical components of the Java language.
Cheers.
精彩评论