开发者

Does C# has impact when evaluating /parsing expressions?

Sometimes back I was trying the following statement in C#

i++开发者_运维技巧+++i // does not compile <bt>

i++ + ++i // executed

Does space has impact in expressions?

In what way the above statements are different?

Ramachandran


First off, let me explain how a compiler works.

The first thing we do is break up the source code into tokens. Then we organize the tokens into a parse tree. Then we do a semantic analysis of the parse tree, and then we generate code based on that semantic analysis. Any of those stages - lexing, parsing or analyzing - can produce errors. Here's the important part: we do not go back and re-do a previous stage if a later stage got an error.

The lexer is "greedy" - it attempts to make the biggest token it can at every stage of the way. So Daniel is right. The lexer breaks i+++++i up into i / ++ / ++ / + / i

Then the parser tries to turn that into a parse tree and it comes up with

                +
               / \
              ++  i
             /
            ++
           /
          i   

That is, equivalent to (((i++)++) + i).

Now the semantic analyzer looks at that and says "i++ is fine because i is a variable. But i++ is not a variable, it's a value. You cannot do ++ on a value, so the second ++ is illegal." The semantic analyzer then gives the error: the operand of an increment must be a variable.

The semantic analyzer does not then re-do the lex and say you know, this could have been i / ++ / + / ++ / i, which would parse differently and be legal. We don't backtrack because there could be billions of possible ways to re-lex and re-parse a program and we don't want to have to try all of them. Just consider your case; i+++++i could be (((i++)++)+i) or )((i++)+(+(+i))) or (i+(+(+(+(+i)))) or... remember, + can be part of a unary plus, binary plus, pre-increment or post-increment, and therefore there are a lot of possible combinations for these five plusses.


Spaces have an effect when separating tokens.

Obvious example: int a = 1; is not the same as inta=1;

Likewise, + +a is the same as +(+(a)), which simply returns 1 (assuming a=1). However removing the space, the two + tokens will form a single ++ token, thus incrementing a to 2.

Your first example tokenized results in 5 tokens: "i" "++" "++" "+" "i"

Your second example results in slightly different 5 tokens: "i" "++" "+" "++" "i"


C-ish (C/C++/C#/Java) compilers use a style of parsing called "max munch" -- They will always try to make the largest token possible.

Now, some characters automatically end a token -- a semicolon or a parenthesis for example. And whitespace is one of them.

So, in your second example, then are 5 tokens: "i" "++" "+" "++" "i"

However, in your first example, there are only three: "i" "+++++" and "i"

Also, I should note that while your second example will compile under C++, it's completely illegal (you are not allowed to modify the same value twice in one statement). It's probably illegal in C# also, though C# tends to be a bit more flexible on those things.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜