开发者

Are negative array indexes allowed in C?

I was just reading some code and found that the person was using arr[-2] to access the 2nd element before the arr, like so:

|a|b|c|d|e|f|g|
       ^------------ arr[0]
         ^---------- arr[1]开发者_运维问答
   ^---------------- arr[-2]

Is that allowed?

I know that arr[x] is the same as *(arr + x). So arr[-2] is *(arr - 2), which seems OK. What do you think?


That is correct. From C99 §6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

There's no magic. It's a 1-1 equivalence. As always when dereferencing a pointer (*), you need to be sure it's pointing to a valid address.


This is only valid if arr is a pointer that points to the second element in an array or a later element. Otherwise, it is not valid, because you would be accessing memory outside the bounds of the array. So, for example, this would be wrong:

int arr[10];

int x = arr[-2]; // invalid; out of range

But this would be okay:

int arr[10];
int* p = &arr[2];

int x = p[-2]; // valid:  accesses arr[0]

It is, however, unusual to use a negative subscript.


Sounds fine to me. It would be a rare case that you would legitimately need it however.


What probably was that arr was pointing to the middle of the array, hence making arr[-2] pointing to something in the original array without going out of bounds.


I'm not sure how reliable this is, but I just read the following caveat about negative array indices on 64-bit systems (LP64 presumably): http://www.devx.com/tips/Tip/41349

The author seems to be saying that 32 bit int array indices with 64 bit addressing can result in bad address calculations unless the array index is explicitly promoted to 64 bits (e.g. via a ptrdiff_t cast). I have actually seen a bug of his nature with the PowerPC version of gcc 4.1.0, but I don't know if it's a compiler bug (i.e. should work according to C99 standard) or correct behaviour (i.e. index needs a cast to 64 bits for correct behaviour) ?


I know the question is answered, but I couldn't resist sharing this explanation.

I remember Principles of Compiler design: Let's assume a is an int array and size of int is 2, and the base address for a is 1000.

How will a[5] work ->

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

This explanation is also the reason why negative indexes in arrays work in C; i.e., if I access a[-5] it will give me:

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

It will return the object at location 990. So, by this logic, we can access negative indexes in arrays in C.


About why would someone want to use negative indexes, I have used them in two contexts:

  1. Having a table of combinatorial numbers that tells you comb[1][-1] = 0; you can always check indexes before accessing the table, but this way the code looks cleaner and executes faster.

  2. Putting a centinel at the beginning of a table. For instance, you want to use something like

     while (x < a[i]) i--;
    

but then you should also check that i is positive.
Solution: make it so that a[-1] is -DBLE_MAX, so that x&lt;a[-1] will always be false.


#include <stdio.h>

int main() // negative index
{ 
    int i = 1, a[5] = {10, 20, 30, 40, 50};
    int* mid = &a[5]; //legal;address,not element there
    for(; i < 6; ++i)
    printf(" mid[ %d ] = %d;", -i, mid[-i]);
}


I would like to share an example:

GNU C++ library basic_string.h

[notice: as someone points out that this is a "C++" example, it may not be fit for this topic of "C". I write a "C" code, which has same concept as the example. At least, GNU gcc compiler doesn't complain anything.]

It uses [-1] to move pointer back from user string to management information block. As it alloc memory once with enough room.

Said " * This approach has the enormous advantage that a string object * requires only one allocation. All the ugliness is confined * within a single %pair of inline functions, which each compile to * a single @a add instruction: _Rep::_M_data(), and * string::_M_rep(); and the allocation function which gets a * block of raw bytes and with room enough and constructs a _Rep * object at the front. "

Source code: https://gcc.gnu.org/onlinedocs/gcc-10.3.0/libstdc++/api/a00332_source.html

   struct _Rep_base
   {
     size_type               _M_length;
     size_type               _M_capacity;
     _Atomic_word            _M_refcount;
   };

   struct _Rep : _Rep_base
   {
      ...
   }

  _Rep*
   _M_rep() const _GLIBCXX_NOEXCEPT
   { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

It explained:

*  A string looks like this:
*
*  @code
*                                        [_Rep]
*                                        _M_length
*   [basic_string<char_type>]            _M_capacity
*   _M_dataplus                          _M_refcount
*   _M_p ---------------->               unnamed array of char_type
*  @endcode
*
*  Where the _M_p points to the first character in the string, and
*  you cast it to a pointer-to-_Rep and subtract 1 to get a
*  pointer to the header.
*
*  This approach has the enormous advantage that a string object
*  requires only one allocation.  All the ugliness is confined
*  within a single %pair of inline functions, which each compile to
*  a single @a add instruction: _Rep::_M_data(), and
*  string::_M_rep(); and the allocation function which gets a
*  block of raw bytes and with room enough and constructs a _Rep
*  object at the front.
*
*  The reason you want _M_data pointing to the character %array and
*  not the _Rep is so that the debugger can see the string
*  contents. (Probably we should add a non-inline member to get
*  the _Rep for the debugger to use, so users can check the actual
*  string length.)
*
*  Note that the _Rep object is a POD so that you can have a
*  static <em>empty string</em> _Rep object already @a constructed before
*  static constructors have run.  The reference-count encoding is
*  chosen so that a 0 indicates one reference, so you never try to
*  destroy the empty-string _Rep object.
*
*  All but the last paragraph is considered pretty conventional
*  for a C++ string implementation.

// use the concept before, to write a sample C code

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

typedef struct HEAD {
    int f1;
    int f2;
}S_HEAD;

int main(int argc, char* argv[]) {
    int sz = sizeof(S_HEAD) + 20;

    S_HEAD* ha = (S_HEAD*)malloc(sz);
    if (ha == NULL)
      return -1;

    printf("&ha=0x%x\n", ha);

    memset(ha, 0, sz);

    ha[0].f1 = 100;
    ha[0].f2 = 200;

    // move to user data, can be converted to any type
    ha++;
    printf("&ha=0x%x\n", ha);

    *(int*)ha = 399;

    printf("head.f1=%i head.f2=%i user data=%i\n", ha[-1].f1, ha[-1].f2, *(int*)ha);

    --ha;
    printf("&ha=0x%x\n", ha);

    free(ha);

    return 0;
}



$ gcc c1.c -o c1.o -w
(no warning)
$ ./c1.o 
&ha=0x13ec010
&ha=0x13ec018
head.f1=100 head.f2=200 user data=399
&ha=0x13ec010

The library author uses it. May it be helpful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜