开发者

What is an example of a lexical error and is it possible that a language has no lexical errors?

for our compiler 开发者_C百科theory class, we are tasked with creating a simple interpreter for our own designed programming language. I am using jflex and cup as my generators but i'm a bit stuck with what a lexical error is. Also, is it recommended that i use the state feature of jflex? it feels wrong as it seems like the parser is better suited to handling that aspect. and do you recommend any other tools to create the language. I'm sorry if i'm impatient but it's due on tuesday.


A lexical error is any input that can be rejected by the lexer. This generally results from token recognition falling off the end of the rules you've defined. For example (in no particular syntax):

[0-9]+   ===> NUMBER token
[a-zA-Z] ===> LETTERS token
anything else ===> error!

If you think about a lexer as a finite state machine that accepts valid input strings, then errors are going to be any input strings that do not result in that finite state machine reaching an accepting state.

The rest of your question was rather unclear to me. If you already have some tools you are using, then perhaps you're best to learn how to achieve what you want to achieve using those tools (I have no experience with either of the tools you mentioned).

EDIT: Having re-read your question, there's a second part I can answer. It is possible that a language could have no lexical errors - it's the language in which any input string at all is valid input.


A lexical error could be an invalid or unacceptable character by the language, like '@' which is rejected as a lexical error for identifiers in Java (it's reserved).

Lexical errors are the errors thrown by your lexer when unable to continue. Which means that there's no way to recognise a lexeme as a valid token for you lexer. Syntax errors, on the other side, will be thrown by your scanner when a given set of already recognised valid tokens don't match any of the right sides of your grammar rules.

it feels wrong as it seems like the parser is better suited to handling that aspect

No. It seems because context-free languages include regular languages (meaning than a parser can do the work of a lexer). But consider than a parser is a stack automata, and you will be employing extra computer resources (the stack) to recognise something that doesn't require a stack to be recognised (a regular expression). That would be a suboptimal solution.

NOTE: by regular expression, I mean... regular expression in the Chomsky Hierarchy sense, not a java.util.regex.* class.


lexical error is when the input doesn't belong to any of these lists: key words: "if", "else", "main"... symbols: '=','+',';'... double symbols: ">=", "<=", "!=", "++" variables: [a-z/A-Z]+[0-9]*
numbers: [0-9]*

examples: 9var: error, number before characters, not a variable and not a key word either. $: error

what I don't know is whether something like more than one symbol after each other is accepted, like "+-"


Compiler can catch an error when it has the grammar in it! It will depend on the compiler itself whether it has the capacity (scope) of catching the lexical errors or not. If is decided during the development of compiler what types of lexical error and how (according to the grammar) they are going to be handled. Usually all famous and mostly used compiler has this capabilities.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜