Variable Length Array overhead in C++?
Looking at this question: Why does a C/C++ compiler need know the size of an array at compile time ? it came to me that compiler implementers should have had some times to get their feet wet now (it's part of C99 standard, that's 10 years ago) and provide efficient implementations.
However it still seems (from the answers) to be considered costly.
开发者_如何转开发This somehow surprises me.
Of course, I understand that a static offset is much better than a dynamic one in terms of performance, and unlike one suggestion I would not actually have the compiler perform a heap allocation of the array since this would probably cost even more [this has not been measured ;)]
But I am still surprised at the supposed cost:
- if there is no VLA in a function, then there would not be any cost, as far I can see.
- if there is one single VLA, then one can either put it before or after all the variables, and therefore get a static offset for most of the stack frame (or so it seems to me, but I am not well-versed in stack management)
The question arise of multiple VLAs of course, and I was wondering if having a dedicated VLA stack would work. This means than a VLA would be represented by a count and a pointer (of known sizes therefore) and the actual memory taken in an secondary stack only used for this purpose (and thus really a stack too).
[rephrasing]
How VLAs are implemented in gcc / VC++ ?
Is the cost really that impressive ?
[end rephrasing]
It seems to me it can only be better than using, say, a vector
, even with present implementations, since you do not incur the cost of a dynamic allocation (at the cost of not being resizable).
EDIT:
There is a partial response here, however comparing VLAs to traditional arrays seem unfair. If we knew the size beforehand, then we would not need a VLA. In the same question AndreyT gave some pointers regarding the implementation, but it's not as precise as I would like.
How VLAs are implemented in gcc / VC++ ?
AFAIK VC++ doesn't implement VLA. It's a C++ compiler and it supports only C89 (no VLA, no restrict). I don't know how gcc implements VLAs but the fastest possible way is to store the pointer to the VLA and its size in the static portion of the stack-frame. This way you can access one of the VLAs with performance of a constant-sized array (it's the last VLA if the stack grows downwards like in x86 (dereference [stack pointer + index*element size + the size of last temporary pushes]), and the first VLA if it grows upwards (dereference [stackframe pointer + offset from stackframe + index*element size])). All the other VLAs will need one more indirection to get their base address from the static portion of the stack.
[ Edit: Also when using VLA the compiler can't omit stack-frame-base pointer, which is redundant otherwise, because all the offsets from the stack pointer can be calculated during compile time. So you have one less free register. — end edit ]
Is the cost really that impressive ?
Not really. Moreover, if you don't use it, you don't pay for it.
[ Edit: Probably a more correct answer would be: Compared to what? Compared to a heap allocated vector, the access time will be the same but the allocation and deallocation will be faster. — end edit ]
If it were to be implemented in VC++, I would assume the compiler team would use some variant of _alloca(size)
. And I think the cost is equivalent to using variables with greater than 8-byte alignment on the stack (such as __m128
); the compiler has to store the original stack pointer somewhere, and aligning the stack requires an extra register to store the unaligned stack.
So the overhead is basically an extra indirection (you have to store the address of VLA somewhere) and register pressure due to storing the original stack range somewhere as well.
精彩评论