Using MPI_Gather for 3d array in c++?

2023-02-24 00:58 问答作者：

I am trying to parallelize a for loop operation which is sandwiched between two for loops.

After the data (3d array) is calculated in each processor I want to collect the data from each processor back to the root node for my further processing. I tried using the MPI_Gather function to get the data back to the root node. Using this function the data is collected back from root processor and but the data is not collected from the other processors.

int main(int argc, char * argv[]) {

  int i,k,l,j;
  int Np = 7, Nz = 7, Nr = 4;
  int mynode, totalnodes;
  MPI_Status status;
  long double ***k_p, ***k_p1; 
  int startvalp,endvalp;

  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
  MPI_Comm_rank(MPI_COMM_WORLD, &mynode);

  // Allocation of memory
  Allocate_3D_R(k_p,(Nz+1),(Np+1),(Nr+1));
  Allocate_3D_R(k_p1,(Nz+1),(Np+1),(Nr+1));

  // startvalp represents the local starting value for each processor
  // endvalp represents the local ending value for each processor

  startvalp = (Np+1)*mynode/totalnodes - 0;
  endvalp = startvalp + (((Np+1)/totalnodes) -1);

  for(l = 0 ; l <= 1 ; l++){
    for(k=startvalp; k<=endvalp; k++){
      // for loop parallelized between the processors 
      // original loop:开发者_StackOverflow中文版 for(k=0; k<= Np; k++) 
      for(i=0; i<=1; i++){
        k_p[i][k][l] =  l+k+i;
      }
    }
  }

  // For Np = 7  and for two processors ; 
  //   k = 0 - 3 is calculated in processor 0;
  //   k = 4 - 7 is calculated in processor 1;

  // Now I need to collect the value of k_p from processor 1 
  // back to the root processor.
  // MPI_Gather function  is used.

  for(l = 0 ; l <= 1 ; l++){
    for(k=startvalp; k<=endvalp; k++){      
      for(i=0; i<=1; i++){
        MPI_Gather(&(k_p[i][k][l]),1, MPI_LONG_DOUBLE,&(k_p1[i][k][l]),1, MPI_LONG_DOUBLE, 0, MPI_COMM_WORLD);
      }
    }
  }

  // Using this the k_p is collected from root processor and stored
  // in the k_p1 variable, but from the slave processor it is not
  // collected back to the root processor.

  if(mynode == 0){
    for(l = 0 ; l <= 1 ; l++){
      for(k=0; k<=Np; k++){
        for(i=0i<=1;i++){
          cout << "Processor "<<mynode;
          cout << ": k_p["<<i<<"]["<<k<<"]["<<l<<"] = " <<k_p1[i][k][l]<<endl;
        }
      }
    }
  }

  MPI_Finalize();

} // end of main


void Allocate_3D_R(long double***& m, int d1, int d2, int d3) {

  m=new long double** [d1];
  for (int i=0; i<d1; ++i) {
    m[i]=new long double* [d2];
    for (int j=0; j<d2; ++j) {
      m[i][j]=new long double [d3];
      for (int k=0; k<d3; ++k) {
        m[i][j][k]=0.0;
      }
    }
  }
}

Here is the output :

Processor 0: k_p[0][0][0] = 0
Processor 0: k_p[1][0][0] = 1
Processor 0: k_p[0][1][0] = 1
Processor 0: k_p[1][1][0] = 2
Processor 0: k_p[0][2][0] = 2
Processor 0: k_p[1][2][0] = 3
Processor 0: k_p[0][3][0] = 3
Processor 0: k_p[1][3][0] = 4
Processor 0: k_p[0][4][0] = 0
Processor 0: k_p[1][4][0] = 0
Processor 0: k_p[0][5][0] = 0
Processor 0: k_p[1][5][0] = 0
Processor 0: k_p[0][6][0] = 0
Processor 0: k_p[1][6][0] = 0
Processor 0: k_p[0][7][0] = 0
Processor 0: k_p[1][7][0] = 0
Processor 0: k_p[0][0][1] = 1
Processor 0: k_p[1][0][1] = 2
Processor 0: k_p[0][1][1] = 2
Processor 0: k_p[1][1][1] = 3
Processor 0: k_p[0][2][1] = 3
Processor 0: k_p[1][2][1] = 4
Processor 0: k_p[0][3][1] = 4
Processor 0: k_p[1][3][1] = 5
Processor 0: k_p[0][4][1] = 0
Processor 0: k_p[1][4][1] = 0
Processor 0: k_p[0][5][1] = 0
Processor 0: k_p[1][5][1] = 0
Processor 0: k_p[0][6][1] = 0
Processor 0: k_p[1][6][1] = 0
Processor 0: k_p[0][7][1] = 0
Processor 0: k_p[1][7][1] = 0

The data from the root processor is transferred but not from the other processor. I tried using MPI_Send and MPI_Recv functions and did not encounter the above problem but for large values of for loop it takes more time.

Therefore can anyone provide a solution to the above problem?

The issues here are actually similar to the issues in 2d: MPI_Type_create_subarray and MPI_Gather ; there's a very lengthy answer there that covers most of the crucial points.

Gathering multidimensional array sections is trickier than just doing 1d arrays, because the data you're gathering actually overlaps. Eg, the first row from rank 1 comes between the first and second rows of rank 0. So you need to (a) use mpi_gatherv, so you can specify the displacments, and (b) set the extents of the data types explicitly to facilitate the overlapping.

Understanding the sending and receiving of complex data structures (in MPI, or in anything else) is all about understanding the layout of data in memory -- which is crucial for getting high performance out of your code anyway.

Speaking of layout of memory, your Allocate3d won't work for the purposes here; the problem is that it allocates memory that may not be contiguous. There's no guarantee that if you allocate a 10x10x10 array this way that element [1][0][0] comes immediately after element [0][9][9]. This is a common problem in C/C++, which doesn't come with any built-in concept of multidimensional arrays. You'll need to do something like this:

void Allocate_3D_R(long double***& m, int d1, int d2, int d3) {

  m=new long double** [d1];
  for (int i=0; i<d1; ++i) {
    m[i]=new long double* [d2];
  }
  m[0][0] = new long double[d1*d2*d3];
  for (int i=0; i<d1; ++i) {
    for (int j=0; j<d2; ++j) {
      if (i!=0 && j!=0)
        m[i][j]=&(m[0][0][(i*d2+j)*d3];
      for (int k=0; k<d3; ++k) {
         m[i][j][k]=0.0;
      }
    }
  }

plus or minus -- that is, you'll need to allocate a contiguous d1*d2*d3 chunk of memory, and then point the array indicies to the appropriate places in that contiguous memory.

继续阅读：mpi

Using MPI_Gather for 3d array in c++?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？