开发者

What functions a lexer needs to provide?

I am making a lexer, don't tell me to not do because I already did most of it.

Currently it makes an array of tokens and that's it.

I would like to know, what functions the lexer needs to provide and a brief explanation of what each function needs to do.

I'll accept the most complete list.

An example function would be:

next: Consume the current token and return it

Also, should the lexer have the expect function or should the interpreter implement it?

By the way, the lexer constructor accept开发者_如何学编程s a string as argument and make the lexical analyses and store all the tokens in the "tokens" variable.

The language is javascript, so I can't overload operators.


In my experience, you need:

  • nextToken — move forward in the input and get the next token.
  • curToken — return the current token; don't move
  • curValue — tokens like STRING and NUMBER have values; tokens like SEMICOLON don't
  • sourcePos — return the source position (line number, character position) of the first character of the current token

edit — oh also:

  • prefetch — initialize the lexer by getting the first token.

Additionally, for some languages you might want 2 or more tokens of lookahead. Then you'd want a variation on plain curToken so that you can look at a bigger "window" on the token stream. For most languages that's not really necessary however.

edit again — also I won't tell you not to write one because they're basically the funnest things ever. In javascript you can't get too crazy, but in a language like Erlang you can have your lexer act like a "token pump" by making it generate a stream of tokens it sends to a separate parser process.


You should be able to compile a comprehensive list by writing a program that uses your lexer, and implementing the functions you end up needing.


Think a second time about what you're asking: "what functions the lexer needs to provide"

What it it "needs" depends of course on what you need, not what it needs. We will probably be able to give you better aid if you explain your own needs. But well, here's a shot anyway:

A minimal one would consist of a single function that takes a string as an argument and returns a list of strings (or an iterator over strings if you want to be fancy and deferred). That's enough for many use-cases and hence is what a lexer "needs".

A more descriptive one could return more complex objects than strings, containing further information about each token (such as it's position in the original string for example, so that you'll be able to tell the poor programmer with syntax errors in his code where he should look). You can probably come up with lots of meta data to add in there besides line numbers, but once again, it all depends on your needs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜