开发者

Why is ~size_t(0) (== 0xFFFFFFFF in most 32-bit systems) not a valid array index?

Quoting from this blogpost:

http://www.codesynthesis.com/~boris/blog/2008/10/13/writing-64-bit-safe-code/

This works because a valid memory index can only be in the [0, ~size_t(0)-1] range. The same approach, f开发者_JAVA技巧or example, is used in std::string.

So why is ~size_t(0) (this should usually equal 0xFFFFFFFF in 32-bit systems) not a valid array index? I assume that if you have 32 bits you should be able to reference the whole range [0, 0xFFFFFFFF], no?


IMPORTANT NOTE: The term "memory index" is ambiguous and confusing. The linked article refers strictly to array indexes, not addresses in memory. It is entirely valid for size_t to be incapable of representing all memory addresses, which is why we have the intptr_t type in C99. Of course, this doesn't apply to your workstation, which undoubtedly has a simple Von Neumann type architecture. (The question has since been edited to remove references to "memory indexes".)

The C standard guarantees that size_t can hold the size of any array. However, for any array a[N], the standard guarantees that a + N must be a valid pointer and compare not equal to any pointer to an element of a.

Therefore, size_t must be able to represent at least one value larger than any possible array index. Since ~(size_t)0 is guaranteed to be the maximum size_t value, it is a good choice of sentinel for array indexes.

Discussion:

  1. Why is ~(size_t)0 guaranteed to be the maximum? Because the standard explicitly says so: from §6.5.3.3: "If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E." Note that (size_t)-1 is guaranteed to also be the maximum by the rules of conversion from signed to unsigned types. Unfortunately, it is not always easy to find the definition for SIZE_MAX on your platform, so (size_t)-1 and ~(size_t)0 are preferred. (Note that this is no longer true if int can represent SIZE_MAX… but this isn't something that would happen in a real system.)

  2. What is the size of an array indexed from 0 to ~0? Such an array cannot exist according to the C standard, by the argument outlined at the top of this post.

  3. If you malloc(-1), the resulting memory region would have to start at 0. (FALSE) There are a lot of really bizarre cases which the standard allows but one doesn't encounter in practice. For example, imagine a system where (uintptr_t)-1 > (size_t)-1. The C standard is worded in exactly the way it is because it doesn't just run on your PC, it runs on bizarre little DSPs with Harvard architectures, and it runs on archaic systems with byzantine memory segmenting schemes. There are also some systems of historical interest where NULL pointers do not have the same representation as 0.


The intuition is this:

x = malloc(~size_t(0));    // the most you can allocate
x[~size_t(0) -1];          // the highest valid index


Well, as @Apprentice Queue correctly noted in his answer, since the size of the largest continuous object in C or C++ program is limited by SIZE_MAX (same as (size_t) -1, same as ~size_t(0)) he maximum index one'll ever need to index the bytes of that object is SIZE_MAX - 1. Yet at the same time, as @Dietrich Epp correctly notes in his answer, C and C++ allow address arithmetic one element beyond the end of the array, which makes SIZE_MAX a valid index, if not for accessing the array elements, then at least for pointer arithmetic. So, formally speaking SIZE_MAX is a valid index, even though it can't stand for an existing element of an array.

However, the whole idea of using size_t as a type that allows one to "index entire memory" is only valid within the bounds of some specific platform, where size_t does indeed happen to be sufficient for memory indexing ("flat memory" platforms like Win32, Win64, Linux belong to this category). In reality, from the general point of view, type size_t is not sufficient for memory indexing. Type size_t is only guaranteed to be sufficient for indexing bytes in a single continuous object in C or C++ program. Neither C nor C++ guarantees to support continuous objects that cover the entire address space. In other words, type size_t is guaranteed to be sufficient to index any explicitly declared array object in C/C++ program, but it generally is not guaranteed to be sufficient for counting nodes in a linked list. Yet, under assumption that on some specific platform the range of size_t does cover the entire memory, the value of (size_t) -1 looks like a good choice of the "reserved" value, since this index can only stand for the last byte in the array of bytes covering the whole address space. Obviously, no one will ever need this index in practice for the actual indexing.

Nevertheless, if you are really interested in a formally appropriate type that can index the entire memory (and, consequently, is able to store the number of elements in any in-memory container), that would be uintptr_t, not size_t. The author of that blog post does seem to understand the general issue here, since he notes that size_t is not good for indexing files (i.e. for storing file sizes or offsets). However, it still would be nice to note that for pretty much the very same reasons type size_t is not appropriate for indexing memory as well. Type size_t is not really related to RAM or process address space, contrary to what the author claims in his blog entry. Type uintptr_t is related to process address space, but not size_t. The fact that size_t is sufficient on the platforms he mentions is nothing more than a specific property of those platforms.


Yes; I have read all of the answers so far. I will now add a simple example with only 2 significant bits. The maximum representable 2 bit unsigned value is 3 which means we can allocate, at best, 3 units of storage.

The indices to the units are: [0, 1, 2] - notice that the maximum value of 3 is not within the addressable range because it is not possible to allocate storage that large with only 2 bits.

The root cause for this is that the indexing in C and C++ start from zero. The indices are "off by one" because size of 1 overlaps offset of 0. This causes the indices with same number of bits reach one element further than the size.

That's the ELI-5 version of the answer.

Practical examples:

char buffer[4]; // ERROR: 4 is not presentable with 2 bits

uint32_t buffer[1]; // HACK: we allocated 4 chars.. now what?


The answer of Dietrich Epp is perfectly correct, of course. However, I want to add a technical reason to it: It is plain impossible to allocate an array so large that ~size_t(0) could index into it.

The point is, that there are at least a few bytes of code mapped into your address space at any time, so you simply do not have the entire virtual address space at your disposal for mapping your array. (And I didn't even mention the allocations for stack space and the null page!) You see, the only place that an array that can be indexed with ~size_t(0) could be mapped is address zero, and it would extend across every single byte of virtual memory, so that the very last byte could be indexed. This is plain impossible to achieve.

Also consider this: Your CPU does not care for sign when adding numbers. To the CPU, adding ~size_t(0) to a value is the same as decrementing that value by one. And decrementing any pointer by one points it outside of its allocated memory range, unless it is the null pointer which cannot point to a valid memory range...

The restriction of the language standard that allocations can not be larger than ~size_t(0) is much weaker than these technical restrictions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜