Why my antlr lexer java class is "code too large"?
This is the lexer in Antlr (sorry for a long file):
lexer grammar SqlServerDialectLexer;
/* T-SQL words */
AND: 'AND';
BIGINT: 'BIGINT';
BIT: 'BIT';
CASE: 'CASE';
CHAR: 'CHAR';
COUNT: 'COUNT';
CREATE: 'CREATE';
CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
DATETIME: 'DATETIME';
DECLARE: 'DECLARE';
ELSE: 'ELSE';
END: 'END';
FLOAT: 'FLOAT';
FROM: 'FROM';
GO: 'GO';
IMAGE: 'IMAGE';
INNER: 'INNER';
INSERT: 'INSERT';
INT: 'INT';
INTO: 'INTO';
IS: 'IS';
JOIN: 'JOIN';
NOT: 'NOT';
NULL: 'NULL';
NUMERIC: 'NUMERIC';
NVARCHAR: 'NVARCHAR';
ON: 'ON';
OR: 'OR';
SELECT: 'SELECT';
SET: 'SET';
SMALLINT: 'SMALLINT';
TABLE: 'TABLE';
THEN: 'THEN';
TINYINT: 'TINYINT';
UPDATE: 'UPDATE';
USE: 'USE';
VALUES: 'VALUES';
VARCHAR: 'VARCHAR';
WHEN: 'WHEN';
WHERE: 'WHERE';
QUOTE: '\'' { textMode = !textMode; };
QUOTED: {textMode}?=> ~('\'')*;
EQUALS: '=';
NOT_EQUALS: '!=';
SEMICOLON: ';';
COMMA: ',';
OPEN: '(';
CLOSE: ')';
VARIABLE: '@' NAME;
NAME:
( LETTER | '#' | '_' ) ( LETTER | NUMBER | '#' | '_' | '.' )*
;
NUMBER: DIGIT+;
fragment LETTER: 'a'..'z' | 'A'..'Z';
fragment DIGIT: '0'..'9';
SPACE
:
开发者_如何学C ( ' ' | '\t' | '\n' | '\r' )+
{ skip(); }
;
JDK 1.6 says code too large
and can't compile it. Why and how to solve the problem?
Actually I wouldn't say this is a big grammar, and there must be a reason why it doesn't produce reasonably sized code.
I think the problem is directly related to this rule:
QUOTED: {textMode}?=> ~('\'')*;
Is there any particular reason why you want the QUOTED part as a separate token, rather than leaving it combined with the quote, as Bart also put it in his grammar? This would also make the textMode
variable obsolete.
Dropping the QUOTE and replacing QUOTED with
QUOTED: '\'' (~'\'')* '\'';
most probably will solve the problem, even without splitting the grammar.
Divide your grammar into several composite grammars. Be careful what you place where. For example, you don't want to place the NAME
rule in you top-grammar and keywords into an imported grammar: the NAME
would "overwrite" the keywords from being matched.
This works:
A.g
lexer grammar A;
SELECT: 'SELECT';
SET: 'SET';
SMALLINT: 'SMALLINT';
TABLE: 'TABLE';
THEN: 'THEN';
TINYINT: 'TINYINT';
UPDATE: 'UPDATE';
USE: 'USE';
VALUES: 'VALUES';
VARCHAR: 'VARCHAR';
WHEN: 'WHEN';
WHERE: 'WHERE';
QUOTED: '\'' ('\'\'' | ~'\'')* '\'';
EQUALS: '=';
NOT_EQUALS: '!=';
SEMICOLON: ';';
COMMA: ',';
OPEN: '(';
CLOSE: ')';
VARIABLE: '@' NAME;
NAME:
( LETTER | '#' | '_' ) ( LETTER | NUMBER | '#' | '_' | '.' )*
;
NUMBER: DIGIT+;
fragment LETTER: 'a'..'z' | 'A'..'Z';
fragment DIGIT: '0'..'9';
SPACE
:
( ' ' | '\t' | '\n' | '\r' )+
{ skip(); }
;
SqlServerDialectLexer.g
lexer grammar SqlServerDialectLexer;
import A;
AND: 'AND';
BIGINT: 'BIGINT';
BIT: 'BIT';
CASE: 'CASE';
CHAR: 'CHAR';
COUNT: 'COUNT';
CREATE: 'CREATE';
CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
DATETIME: 'DATETIME';
DECLARE: 'DECLARE';
ELSE: 'ELSE';
END: 'END';
FLOAT: 'FLOAT';
FROM: 'FROM';
GO: 'GO';
IMAGE: 'IMAGE';
INNER: 'INNER';
INSERT: 'INSERT';
INT: 'INT';
INTO: 'INTO';
IS: 'IS';
JOIN: 'JOIN';
NOT: 'NOT';
NULL: 'NULL';
NUMERIC: 'NUMERIC';
NVARCHAR: 'NVARCHAR';
ON: 'ON';
OR: 'OR';
And it compiles fine:
java -cp antlr-3.3.jar org.antlr.Tool SqlServerDialectLexer.g
javac -cp antlr-3.3.jar *.java
As you can see, invoking the org.antlr.Tool
on your "top-lexer" is enough: ANTLR automatically generates classes for the imported grammar(s). If you have more grammars to import, do it like this:
import A, B, C;
EDIT
Gunther is correct: changing the QUOTED
rule is enough. I'll leave my answer though, because when you're going to add more keywords, or add quite a few parser rules (inevitable with SQL grammars), you'll most probably stumble upon the "code too large" error again. In that case, you can use my proposed solution.
If you're going to accept an answer, please accept Gunther's.
Hmm. I don't suppose you can further break that down into separate files with import statements?
Apparently someone wrote a post-processor to split things up automatically, but I haven't tried it.
精彩评论