开发者

What happens to a declared, uninitialized variable in C? Does it have a value?

If in C I write:

int num;

Before I assign开发者_如何学JAVA anything to num, is the value of num indeterminate?


Static variables (file scope and function static) are initialized to zero:

int x; // zero
int y = 0; // also zero

void foo() {
    static int x; // also zero
}

Non-static variables (local variables) are indeterminate. Reading them prior to assigning a value results in undefined behavior.

void foo() {
    int x;
    printf("%d", x); // the compiler is free to crash here
}

In practice, they tend to just have some nonsensical value in there initially - some compilers may even put in specific, fixed values to make it obvious when looking in a debugger - but strictly speaking, the compiler is free to do anything from crashing to summoning demons through your nasal passages.

As for why it's undefined behavior instead of simply "undefined/arbitrary value", there are a number of CPU architectures that have additional flag bits in their representation for various types. A modern example would be the Itanium, which has a "Not a Thing" bit in its registers; of course, the C standard drafters were considering some older architectures.

Attempting to work with a value with these flag bits set can result in a CPU exception in an operation that really shouldn't fail (eg, integer addition, or assigning to another variable). And if you go and leave a variable uninitialized, the compiler might pick up some random garbage with these flag bits set - meaning touching that uninitialized variable may be deadly.


0 if static or global, indeterminate if storage class is auto

C has always been very specific about the initial values of objects. If global or static, they will be zeroed. If auto, the value is indeterminate.

This was the case in pre-C89 compilers and was so specified by K&R and in DMR's original C report.

This was the case in C89, see section 6.5.7 Initialization.

If an object that has automatic storage duration is not initialized explicitely, its value is indeterminate. If an object that has static storage duration is not initialized explicitely, it is initialized implicitely as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant.

This was the case in C99, see section 6.7.8 Initialization.

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.

As to what exactly indeterminate means, I'm not sure for C89, C99 says:

3.17.2
indeterminate value

either an unspecified value or a trap representation

But regardless of what standards say, in real life, each stack page actually does start off as zero, but when your program looks at any auto storage class values, it sees whatever was left behind by your own program when it last used those stack addresses. If you allocate a lot of auto arrays you will see them eventually start neatly with zeroes.

You might wonder, why is it this way? A different SO answer deals with that question, see: https://stackoverflow.com/a/2091505/140740


It depends on the storage duration of the variable. A variable with static storage duration is always implicitly initialized with zero.

As for automatic (local) variables, an uninitialized variable has indeterminate value. Indeterminate value, among other things, mean that whatever "value" you might "see" in that variable is not only unpredictable, it is not even guaranteed to be stable. For example, in practice (i.e. ignoring the UB for a second) this code

int num;
int a = num;
int b = num;

does not guarantee that variables a and b will receive identical values. Interestingly, this is not some pedantic theoretical concept, this readily happens in practice as consequence of optimization.

So in general, the popular answer that "it is initialized with whatever garbage was in memory" is not even remotely correct. Uninitialized variable's behavior is different from that of a variable initialized with garbage.


Ubuntu 15.10, Kernel 4.2.0, x86-64, GCC 5.2.1 example

Enough standards, let's look at an implementation :-)

Local variable

Standards: undefined behavior.

Implementation: the program allocates stack space, and never moves anything to that address, so whatever was there previously is used.

#include <stdio.h>
int main() {
    int i;
    printf("%d\n", i);
}

compile with:

gcc -O0 -std=c99 a.c

outputs:

0

and decompiles with:

objdump -dr a.out

to:

0000000000400536 <main>:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       48 83 ec 10             sub    $0x10,%rsp
  40053e:       8b 45 fc                mov    -0x4(%rbp),%eax
  400541:       89 c6                   mov    %eax,%esi
  400543:       bf e4 05 40 00          mov    $0x4005e4,%edi
  400548:       b8 00 00 00 00          mov    $0x0,%eax
  40054d:       e8 be fe ff ff          callq  400410 <printf@plt>
  400552:       b8 00 00 00 00          mov    $0x0,%eax
  400557:       c9                      leaveq
  400558:       c3                      retq

From our knowledge of x86-64 calling conventions:

  • %rdi is the first printf argument, thus the string "%d\n" at address 0x4005e4

  • %rsi is the second printf argument, thus i.

    It comes from -0x4(%rbp), which is the first 4-byte local variable.

    At this point, rbp is in the first page of the stack has been allocated by the kernel, so to understand that value we would to look into the kernel code and find out what it sets that to.

    TODO does the kernel set that memory to something before reusing it for other processes when a process dies? If not, the new process would be able to read the memory of other finished programs, leaking data. See: Are uninitialized values ever a security risk?

We can then also play with our own stack modifications and write fun things like:

#include <assert.h>

int f() {
    int i = 13;
    return i;
}

int g() {
    int i;
    return i;
}

int main() {
    f();
    assert(g() == 13);
}

Note that GCC 11 seems to produce a different assembly output, and the above code stops "working", it is undefined behavior after all: Why does -O3 in gcc seem to initialize my local variable to 0, while -O0 does not?

Local variable in -O3

Implementation analysis at: What does <value optimized out> mean in gdb?

Global variables

Standards: 0

Implementation: .bss section.

#include <stdio.h>
int i;
int main() {
    printf("%d\n", i);
}

gcc -O0 -std=c99 a.c

compiles to:

0000000000400536 <main>:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       8b 05 04 0b 20 00       mov    0x200b04(%rip),%eax        # 601044 <i>
  400540:       89 c6                   mov    %eax,%esi
  400542:       bf e4 05 40 00          mov    $0x4005e4,%edi
  400547:       b8 00 00 00 00          mov    $0x0,%eax
  40054c:       e8 bf fe ff ff          callq  400410 <printf@plt>
  400551:       b8 00 00 00 00          mov    $0x0,%eax
  400556:       5d                      pop    %rbp
  400557:       c3                      retq
  400558:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40055f:       00

# 601044 <i> says that i is at address 0x601044 and:

readelf -SW a.out

contains:

[25] .bss              NOBITS          0000000000601040 001040 000008 00  WA  0   0  4

which says 0x601044 is right in the middle of the .bss section, which starts at 0x601040 and is 8 bytes long.

The ELF standard then guarantees that the section named .bss is completely filled with of zeros:

.bss This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occu- pies no file space, as indicated by the section type, SHT_NOBITS.

Furthermore, the type SHT_NOBITS is efficient and occupies no space on the executable file:

sh_size This member gives the section’s size in bytes. Unless the sec- tion type is SHT_NOBITS , the section occupies sh_size bytes in the file. A section of type SHT_NOBITS may have a non-zero size, but it occupies no space in the file.

Then it is up to the Linux kernel to zero out that memory region when loading the program into memory when it gets started.


That depends. If that definition is global (outside any function) then num will be initialized to zero. If it's local (inside a function) then its value is indeterminate. In theory, even attempting to read the value has undefined behavior -- C allows for the possibility of bits that don't contribute to the value, but have to be set in specific ways for you to even get defined results from reading the variable.


The basic answer is, yes it is undefined.

If you are seeing odd behavior because of this, it may depended on where it is declared. If within a function on the stack then the contents will more than likely be different every time the function gets called. If it is a static or module scope it is undefined but will not change.


Because computers have finite storage capacity, automatic variables will typically be held in storage elements (whether registers or RAM) that have previously been used for some other arbitrary purpose. If a such a variable is used before a value has been assigned to it, that storage may hold whatever it held previously, and so the contents of the variable will be unpredictable.

As an additional wrinkle, many compilers may keep variables in registers which are larger than the associated types. Although a compiler would be required to ensure that any value which is written to a variable and read back will be truncated and/or sign-extended to its proper size, many compilers will perform such truncation when variables are written and expect that it will have been performed before the variable is read. On such compilers, something like:

uint16_t hey(uint32_t x, uint32_t mode)
{ uint16_t q; 
  if (mode==1) q=2; 
  if (mode==3) q=4; 
  return q; }

 uint32_t wow(uint32_t mode) {
   return hey(1234567, mode);
 }

might very well result in wow() storing the values 1234567 into registers 0 and 1, respectively, and calling foo(). Since x isn't needed within "foo", and since functions are supposed to put their return value into register 0, the compiler may allocate register 0 to q. If mode is 1 or 3, register 0 will be loaded with 2 or 4, respectively, but if it is some other value, the function may return whatever was in register 0 (i.e. the value 1234567) even though that value is not within the range of uint16_t.

To avoid requiring compilers to do extra work to ensure that uninitialized variables never seem to hold values outside their domain, and avoid needing to specify indeterminate behaviors in excessive detail, the Standard says that use of uninitialized automatic variables is Undefined Behavior. In some cases, the consequences of this may be even more surprising than a value being outside the range of its type. For example, given:

void moo(int mode)
{
  if (mode < 5)
    launch_nukes();
  hey(0, mode);      
}

a compiler could infer that because invoking moo() with a mode which is greater than 3 will inevitably lead to the program invoking Undefined Behavior, the compiler may omit any code which would only be relevant if mode is 4 or greater, such as the code which would normally prevent the launch of nukes in such cases. Note that neither the Standard, nor modern compiler philosophy, would care about the fact that the return value from "hey" is ignored--the act of trying to return it gives a compiler unlimited license to generate arbitrary code.


If storage class is static or global then during loading, the BSS initialises the variable or memory location(ML) to 0 unless the variable is initially assigned some value. In case of local uninitialized variables the trap representation is assigned to memory location. So if any of your registers containing important info is overwritten by compiler the program may crash.

but some compilers may have mechanism to avoid such a problem.

I was working with nec v850 series when i realised There is trap representation which has bit patterns that represent undefined values for data types except for char. When i took a uninitialized char i got a zero default value due to trap representation. This might be useful for any1 using necv850es


As far as i had gone it is mostly depend on compiler but in general most cases the value is pre assumed as 0 by the compliers.
I got garbage value in case of VC++ while TC gave value as 0. I Print it like below

int i;
printf('%d',i);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜