How does the [] operator work?
I'm working with C, but I think this is a more low level question that isn't language specific.
How does the program correctly grab the right data with array[0] or array[6] regardless of what type开发者_开发技巧 of data it holds? Does it store the length internally or have some sort of delimiter to look for?
the compiler knows the sizeof
the underlying datatype and adds the right byte offset to the pointer.
a[10]
is equivalent to *(a + 10)
which is equivalent to *(10 + a)
which in turn is equivalent to 10[a]
, no kidding.
The compiler figures out the size at compile time and hard-codes the size in the object code.
I would like to contribute something other than a direct answer.
There is an interesting article on Dennis Ritchie's homepage on the history of C which has quite a bit to say about arrays, array indices, etc.
This will probably not directly answer your question, but it may further your understanding of C arrays... and it is an interesting read.
Neither :-)
For an array, the compiler knows: (a) the address of the start of the array, and (b) what type of elements (int, float, double, etc.) the array holds, and hence how long each element is.
With those two pieces of information, finding array[6]
is a simple matter of arithmetic: start with the base address, and add 6 times the size of an element.
The compiler substitutes the length of the datatype which is fixed at compile time.
int getInt(void * memory, offset)
{
return *((int *)(sizeof(int)*offset + memory))
}
void * chunkOfMemory = malloc(0x1000);
int * intarray = (int *) chunkOfMemory;
printf("%d is equal to %d", getInt(chunkOfMemory, 9), intarray[9]);
The compiler knows the size of each element of the array at compile time. For instance:
int64_t array[5];
...
int64_t a = array[3];
This will be converted to the pseudo-assembly code:
addr <- array
addr <- addr + 3 * sizeof(int64_t)
// ^^^^^^^^^^^^^^^ which the compiler knows is 8
// ^^^^^^^^^^^^^^^^^^^ which the compiler can replace with 24.
a <- *addr
The length of the array doesn't matter.
It's compiler magic!
The compiler knows the size of the array elements and uses it to calculate the right address.
No it doesn't. It just get/set element at address array + X*sizeof(TypeOfArrayEl)
so you can easily get out of bounds and no one might give you error at that time. That's why array[6]
is same as 6[array]
Assume array is of type int:
int array[12];
The []
operator adds whatever value is in the brackets (times the size in bytes of the array type) to the value outside the brackets. Arrays are stored by the implementation as pointers to their first items. So that array declaration above allocates 12 * sizeof(int) bytes and makes array
point to the first one. This leads to wonky stuff like 3[array]
giving you the third element in the array.
Anyway, the answer to your question is that the compiler looks at the type of the array at compile time and multiplies the thing in the [] by the size of the type held by the array.
Yes, you are right it is even lower level question, even assembler has []
operator. This answer said quite good but my explanation would be:
arr[x]
is the same as *((void *)(&arr) + x * sizeof(arr[0]))
It looks a bit complicated, but generated code is simple. It is because compiler knows sizeof(arr[0])
and it is hard-coded in compiled code, also (void *)(&arr)
is just language standart which protects programmer from dumb mistakes and in compiled code there is no type conversions.
One more thing, as I mentioned lower level languages, so need to mention higher. Using them you can overload operator and make it do whatever you want.
From what I remember C doesn't give you a compile time error if the index is out of bounds. Even if you go beyond the bounds the pointer simply provides you the next adjacent memory location. The only thing that C takes care of is by how many bytes to increase the pointer. If its an integer array then the pointer will advance by 2 bytes for every increment in the index and for char it'll increment by 1 byte.
You can always access locations that are out of bounds but that is junk data and you as a programmer has to ensure that you're accessing the right data.
That is the price of freedom I guess :)
精彩评论