How does the code behave different for Java and C compiler?
I have this Code, I ran this on Java and C ,but they give me two different results. What is that makes them to run diffe开发者_JS百科rently.
x=10;y=10;z=10;
y-=x--;
z-=--x;
x-=--x-x--;
The Output in Java for value of X is : 8, and for C it is 6.
How these two compiler behave differently for incremented options?
You are wrong when you say that the output of this code considered as a C program is 6
.
Considered as a C program, this is undefined. You just happened to get 6 with your compiler, but you could just as well have gotten 24, segmentation fault, or a compile-time error.
See the C99 standard, 6.5.2:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.71)
--x-x--
is explicitly forbidden by this paragraph.
EDIT:
Aaron Digulla writes in the comments:
Is it really undefined?
Did you notice that I linked to the C99 standard and indicated the paragraph that says this is undefined?
gcc -Wall (GCC 4.1.2) doesn't complain about this and I doubt that any compiler would reject this code.
The standard describes some behaviors as "undefined" precisely because not all ways for a C program to be nonsense can be detected reliably at compile-time. If you think that "no warning" should mean everything's fine, you should switch to another language than C. Many modern languages are better defined. I use OCaml when I have a choice, but there are countless other well-defined languages.
There is a reason why it returns 6 and you should be able to explain it.
I did not notice your explanation of why this expression evaluated to 6. I hope you don't spend too much time writing it, because for me it returns 0.
Macbook:~ pascalcuoq$ cat t.c
#include <stdio.h>
int main(int argc, char **argv)
{
int y;
printf("argc:%d\n", argc);
y = --argc - argc--;
printf("y:%d\n", y);
return 0;
}
Macbook:~ pascalcuoq$ gcc t.c
Macbook:~ pascalcuoq$ ./a.out 1 2 3 4 5 6 7 8 9
argc:10
y:0
This is the time at which you argue that there is a bug in my compiler (since it doesn't return the same thing as yours).
Macbook:~ pascalcuoq$ gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5490~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5490)
Aaron also writes:
As an engineer, you should still be able to explain why it returns one result or the other.
Exactly! I gave the simplest explanation why one might get 6: the result is explicitly specified in C99 as undefined behavior, and it was in earlier standards too.
and:
Lastly, please show a compiler which warns about this construct.
To the best of my knowledge, no compiler warns about *(&x - 1)
where x
is defined by int x;
. Are you claiming that this construct is valid C and that a good engineer should be able to predict the result because no compiler warns about it? This construct is undefined, just like the one being discussed.
Lastly, if you absolutely need warnings to believe there is a problem, consider using a verification tool such as Frama-C. It needs to make some assumptions that are not in the standard to capture some existing practices, but it correctly warns about --x-x--
and most other undefined C behaviors.
How is the term evaluated? The right hand side --x - x--
evaluates to 0 for both Java and C but it changes x
. So the question is: How does -=
work? Does it read x
before the right hand side (RHS) is evaluated and then subtracts the RHS or does it do that after the RHS was evaluated. So do you have
tmp = x // copy the value of x
x = tmp - (--x - x--) // complicated way to say x = x
or
tmp = (--x - x--) // first evaluate RHS, from left to right, which means x -= 2.
x = x - tmp // substract 0 from x
In Java, here is the rule:
A compound assignment expression of the form E1 op= E2 is equivalent to E1 = (T)((E1) op (E2)), where T is the type of E1, except that E1 is evaluated only once. (see 15.26.2 Compound Assignment Operators)
This means the value of is copied, so the pre- and post-decrements have no effect. Your C compiler probably uses a different rule.
For C, this article might help:
The moral is that writing code that depends on order of evaluation is a bad programming practice in any language.
[EDIT] Pascal Cuoq (see below) insist that the standard says the result is undefined. This is probably correct: I stared the the part of he copied out of the standard for a couple of minutes and couldn't understand what that sentence said. I guess I'm not alone here :) So I went to see how the C interpreter works which I developed for my master thesis. It's not standard compliant but I understand how it works. Guess, I'm a Heisenberg-type guy: I can have either at any precision but not both ;) Anyway.
When parsing this construct, you get this parse tree:
+---- (-=) ----+
v -= v
x +--- (-) ----+
v v
PREDEC x POSTDEC x
The standard states that modifying x
three times (once on the left and twice in the two decrement ops), leaves x
undefined. Okay. But a compiler is a deterministic program, so when it accepts some input, it will always produce the same output. And most compilers work the same. I think we all agree that any C compiler will in fact accept this input. What outputs can we expect? Answer: 6 or 8. Reasoning:
x-x
is0
for any value of x.--x-x
is0
for any value of x, because it can be written as--x, x-x
x-x--
is0
because the result of the minus operator is calculated before the post-decrement.
So if the pre-decrement has no influence on the result and neither has the post-decrement has no influence. Also, there is no inference between the two operators (using them both in the same expression as in a = --y - x--
doesn't change their behavior). Conclusion: all and any C compiler will return 0
for --x - x--
(well, except the buggy ones).
Which leaves us with my original assumption: The value RHS has no influence on the result, it always evaluates to 0
but it modifies x
. So the question is how is -=
implemented? There are quite a few factors which play a role here:
- Does the CPU have an native operator for
-=
? Register based CPU do (in fact, they only have such operators. To doa+b
, they have to copya
into a register and then they can+=b
to it), stack based CPUs don't (they push all the values on the stack and then use operators which use the topmost stack elements as operands). - Are the values saved on the stack or in registers? (Another way to ask the first question)
- Which optimization options are active?
To go any further, we must look at the code:
#include <stdio.h>
int main() {
int x = 8;
x -= --x - x--;
printf("x=%d\n", x);
}
When compiled, we get this assembler code for the assignment (x86 code):
.loc 1 4 0
movl $8, -4(%rbp) ; x = 8
.loc 1 5 0
subl $1, -4(%rbp) ; x--
movl $0, %eax ; tmp = 0
subl %eax, -4(%rbp) ; x -= tmp
subl $1, -4(%rbp) ; x--
.loc 1 6 0
movl -4(%rbp), %esi ; push `x` into the place where printf() expects it
The first movl
sets x
to 8
which means -4(%rbp)
is x
. As you can see, the compiler actually notices x-x
and optimizes that to 0
as predicted (even without any optimization options). We also have the two expected --
operations which means the result must always be 6
.
So who is right? We both are. Pascal is right when he says that the standard doesn't define this behavior. But that doesn't mean it's random. All the pieces of the code have a well-defined behavior, so the behavior of the sum can't suddenly be undefined (unless there is something else missing - but not in this case). So even though the standard doesn't treat this problem, it's still deterministic.
For stack based CPUs (that don't have any registers), the result should be 8 since they will copy the value of x
before they start evaluating the right hand side. For register based CPUs, it should always be 6.
Morale: The standard is always right but if you must understand, look at the code ;)
In C++, the result is indeterminate, i.e., not specified or guaranteed to be consistent - the compiler is free to do whatever suits it best at any time based on sequence points.
I suspect the same for Java [and C# etc.]
Well... which do you think is correct, and what is your reasoning?
I believe x
is pretty well determined for the first three steps
x = 10
x is decremented (its initial value is used first)
x is decremented again (its resulting value is used after)
Now x == 8
. But look at what you're doing to it here (pardon the insertion of human-friendly whitespace):
x -= --x - x--
Which could be compiled to (this is what I'd do if I had to include the ++
and --
operators in my language — the side effects are identified first and removed to the fore and aft of the statement as a whole):
--x
t = x - x
x -= t
x--
Giving a result of x == 8
. Or maybe it's been compiled to (the statement is reduced first by subexpression):
t1 = --x // t1 = 7, x = 7
t2 = x-- // t2 = 7, x = 6
t = t1 - t2 // t = 7 - 7 = 0
x -= t // x = 6
Or the subexpressions could have landed the other way round:
t1 = x-- // t1 = 8, x = 7
t2 = --x // t2 = 6, x = 6
t = t2 - t1 // t = 6 - 8 = -2
x -= t // x = 8
In the absence of a formal description of the operators behaviour in such a case, who's to say which is correct?
The fundamental difference between Java and C is that in C language the temporal relationships between different actions (what happens "before" and what happens "after") is determined by so called sequence points. Sequence points implement the concept of time in the process of execution of C program. If two actions are separated from each other by a sequence point, then you can say that one action happens "before" and another happens "after". When two actions have no sequence point between them, there's no defined temporal ordering between them and there's no way to say what happens "first" and what happens "later". Consider a pair of adjacent sequence points in C program as the minimal indivisible unit of time. What happens within that unit of time cannot be described in terms of "before" and "after". One might as well think that between two adjacent sequence points everything is happening simultaneously. Or in random order, whichever you prefer.
In C language the statement
x -= --x - x--;
has no sequence points inside. It only has a sequence point at the very beginning and at the very end. This means that there's no way to say in which order this expression statement is evaluated. It is indivisible it terms of C time, as described above. Every time someone tries to explain what happens here by imposing a specific temporal ordering, they are just wasting their time and producing utter nonsense. This is actually the reason why C language does not (and cannot) make any attempts to define the behavior of expressions with multiple modifications of the same object (x
in the above example). The behavior is undefined.
Java is significantly different in this respect. In Java the concept of time is defined differently. In Java the expressions are always evaluated in strict order defined by the operator precedence and associativity. This imposes a strict temporal ordering on the events that take place during the evaluation of the above expression. This makes the result of this expression defined, as opposed to C.
I don't know for sure, but I'm guessing it's because Java evaluates the postdecrement on the last x-- before evaluating the -= operator, whereas C++ evaluates the -= first and the postdecrement after the entire rest of the expression is done.
精彩评论