开发者

Phases of a compiler?

At which phase of the compilation are keywords of a programming language recognized?

I am sort of confused between

  1. The lexical analysis.
  2. The parsing of the program.

I once wrote a lexer in C using regular expression开发者_开发技巧s but it recognised the main() in int main(void) also as a keyword.

On these lines I think that we have to build a parse tree to recognize keywords.


I had to build a simple compiler this year as a project for which i used Java . The recognition of keywords was made on lexical analysis. During that phase i would read input language and create a token with a type(for keywords type was variable_declaration ) and its value.I also had different types for each case like identifier , constant,multiply operation ,adding operation etc.Then passed these tokens to a queue and next into a parser which would check grammar and create a binary tree which was then used to create the output language .


Typically, the lexical analysis phase of compilation breaks the input text up into sequences of lexemes that each belongs to some particular token type that's useful in later analysis. Consequently, keywords are usually first recognized during lexical analysis in order to make parsing easier. Since parsers tend to be implemented by writing context-free grammars of tokens rather than of lexemes (that is, the category of the lexeme rather than the contents of the lexeme), it is significantly easier to build a parser when keywords are marked during lexing. For example, if I want to have a parser that treats "if" as a keyword, then I may want a rule that looks something like this in my CFG:

Statement ::= 'IF' Expr 'THEN' Expr

If I don't categorize IF and THEN into their own token types, then my parser couldn't write a statement like the above.


It depends very much of the definition, specifically where you draw the lines between scanner, tokenizer, lexer, and parser. Since this is homework, and it is only correct if your prof. says its correct: take a look at the definitions that were supplied in your reading-material.

Regarding main(): You can definitely say that main(), as all other functions, is not a keyword, it is however a token. The tokenizer recognizes that the substring "main" is one token, the parser sets it in relation to it's "(...)" and "{...}" parts. Further, for main() the parser will automatically generate a program entry-point.


That would be lexical analysis.

Some languages have "special" identifiers as well as keywords. These are often added to the identifier table and allocated known constant ID values before the parsing starts, so that they can be spotted easily. These don't normally have special meaning to the parser, though - they should be detected in the abstract syntax tree (AST) after parsing.

For example, take a look at the Oberon language report...

http://www-old.oberon.ethz.ch/oreport.html

Not a language recommendation - just an easily available and simple language specification (very much Wirths style).

Anyway, the "Vocabulary and representation" section includes a list of "operators and delimiters", including what most people would recognise as keywords. These would be recognised by the lexical analyser.

In the "Declarations and scope rules" section, there's a list of predefined identifiers such as "ABS" and "BOOLEAN". I'm not familiar enough with Oberon to be certain, but if I were to write a compiler, there's a good chance I'd just pre-initialise the normal identifier table to include these predefined identifiers.

In C, "main" is in most ways just another function. The compiler may or may not treat it as special. It's possible that the only "special" thing about it is that the startup code that gets linked into your final executable makes a call to that function.


Traditionally, keywords are recognised by a lexer (which leaves you with a language made of a fixed set of keywords). But of course you can do it during a parsing pass. You can even get rid of your lexer entirely, using one of the numerous lexerless parsing techniques (e.g., PEGs). It might help you to avoid confusion.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜