开发者

efficient indexing of an array

Suppose that I have

z[7]={0, 0, 2, 0, 1, 2, 1}

that means- first observation allocated in group 0, 开发者_运维问答second obs group 0, third group 2 etc and I want to write an efficient code to get an array of 3X? such that in the first row I have all the observations allocated in the first group, second row all the obs allocated in the second group etc.. something like

0, 1, 3
4, 6 
2, 5

and this must be general, maybe I could have

z={0, 0, 2, 0, 1, 2, 1, 3, 4, 2, 0, 4, 5, 5, 6, 7, 0}

so the number of columns is unknown

I did do my homework and the code is attached to this message, but there must be a better way to do it. I believe that with pointers but I really do not know how.

#include <stdio.h>
int main(){
  int z[7]={0, 0, 2, 0, 1, 2, 1}, nj[3], iz[3][7], ip[3], i, j;


for(j=0; j<3; j++){
    ip[j] = 0;
    nj[j] = 0;
  }

  for(i=0; i <7; i++ ){
    nj[z[i]] = nj[z[i]] + 1;
    iz[z[i]][ip[z[i]]] = i;
    ip[z[i]] = ip[z[i]] + 1;    
  }  

  for(j=0; j<3 ;j++){
    for(i=0; i < nj[j]; i++){
      printf("%d\t", iz[j][i]);
    }
    printf("\n");    
  }
  return 0;  
}


It seems that you have two tasks here.

  1. To count the number of occurrences of each index in z and allocate a data structure of the right size and configuration.
  2. To iterate over the data and copy it to the correct places.

At the moment you appear to have solved (1) naively by allocating a big, two dimensional array iz. That works fine if you know in advance the limits on how big it could be (and your machine will have enough memory), no need to fix this until later.

It is not clear to me exactly how (2) should be approached. Is the data currently going into iz guaranteed to consist of [0, 1, ... n ]?


If you don't know the limits of the size of iz in advance, then you will have to allocate a dynamic structure. I'd suggest a ragged array though this means two (or even three) passes over z.

What do I mean by a ragged array? A object like the argv argument to main, but in this case of type int **. In memory it looks like this:

+----+     +---+     +---+---+---+--
| iz |---->|   |---->|   |   |   ...
+----+     +---+     +---+---+---+--
           |   |--
           +---+  \    +---+---+---+--
           | . |   --->|   |   |   ...
             .         +---+---+---+--
             .

A ragged array can be accessed with iz[][] just like it was a two-dimensional array (but it is a different type of object), which is nice for your purposes because you can tune your algorithm with the code you have now, and then slap one of these in place.


How to set it up.

  1. Iterate of z to find the largest number, maxZ, present.

  2. Allocate an array of int* of size maxZ+1: iz=callac(maxZ+1,sizeof(int*));.

    I chose calloc because it zeros the memory, which makes all those pointers NULL, but you could use malloc and NULL them yourself. Making the array one too big gives us a NULL termination, which may be useful later.

  3. Allocate an array of counters of size maxZ: int *cz = calloc(maxZ,sizeof(int));

  4. iterate over z, filling cz with the number of entries needed in each row.

  5. For each row, allocate an array of ints: for(i=0; i<maxZ; ++i){ iz[i] = malloc(sizeof(int)*cz[i]; }

  6. Iterate over z one last time, sticking the figures into iz as you already do. You could re-use cz at this point to keep track of how many figure have already been put into each row, but you might want to allocate a separate array for that purpose because so that you have a record of how big each allocated array was.

NB: Every call to malloc or calloc ought to be accompanied by a check to insure that the allocation worked. I've left that as an exercise for the student.


This repeated passes over z business can be avoided entirely by using dynamic arrays, but I suspect you don't need that and don't want the added complexity.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜