Does C# has impact when evaluating /parsing expressions?
Sometimes back I was trying the following statement in C#
i++开发者_运维技巧+++i // does not compile <bt>
i++ + ++i // executed
Does space has impact in expressions?
In what way the above statements are different?Ramachandran
First off, let me explain how a compiler works.
The first thing we do is break up the source code into tokens. Then we organize the tokens into a parse tree. Then we do a semantic analysis of the parse tree, and then we generate code based on that semantic analysis. Any of those stages - lexing, parsing or analyzing - can produce errors. Here's the important part: we do not go back and re-do a previous stage if a later stage got an error.
The lexer is "greedy" - it attempts to make the biggest token it can at every stage of the way. So Daniel is right. The lexer breaks i+++++i up into i / ++ / ++ / + / i
Then the parser tries to turn that into a parse tree and it comes up with
+
/ \
++ i
/
++
/
i
That is, equivalent to (((i++)++) + i).
Now the semantic analyzer looks at that and says "i++ is fine because i is a variable. But i++ is not a variable, it's a value. You cannot do ++ on a value, so the second ++ is illegal." The semantic analyzer then gives the error: the operand of an increment must be a variable.
The semantic analyzer does not then re-do the lex and say you know, this could have been i / ++ / + / ++ / i, which would parse differently and be legal. We don't backtrack because there could be billions of possible ways to re-lex and re-parse a program and we don't want to have to try all of them. Just consider your case; i+++++i could be (((i++)++)+i) or )((i++)+(+(+i))) or (i+(+(+(+(+i)))) or... remember, + can be part of a unary plus, binary plus, pre-increment or post-increment, and therefore there are a lot of possible combinations for these five plusses.
Spaces have an effect when separating tokens.
Obvious example: int a = 1;
is not the same as inta=1;
Likewise, + +a
is the same as +(+(a))
, which simply returns 1 (assuming a=1
). However removing the space, the two +
tokens will form a single ++
token, thus incrementing a to 2.
Your first example tokenized results in 5 tokens: "i" "++" "++" "+" "i"
Your second example results in slightly different 5 tokens: "i" "++" "+" "++" "i"
C-ish (C/C++/C#/Java) compilers use a style of parsing called "max munch" -- They will always try to make the largest token possible.
Now, some characters automatically end a token -- a semicolon or a parenthesis for example. And whitespace is one of them.
So, in your second example, then are 5 tokens: "i" "++" "+" "++" "i"
However, in your first example, there are only three: "i" "+++++" and "i"
Also, I should note that while your second example will compile under C++, it's completely illegal (you are not allowed to modify the same value twice in one statement). It's probably illegal in C# also, though C# tends to be a bit more flexible on those things.
精彩评论