From OpenMP to MPI
I just wonder how to convert the following openMP program to a MPI program
#include <omp.h>
#define CHUNKSIZE 100
#define N 1000
int main (int argc, char *argv[])
{
int i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i)
{
#pragma omp for schedule(dynamic,chunk) nowait
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
} /* end of parallel section */
return 0;
}
I have a similar program that I would like to run on a cluster and the program is using OpenMP.
Thanks!
UPDATE:
In the following toy code, I want to limit the parallel part within function f():
#include "mpi.h"
#include <stdio.h>
#include <string.h>
void f();
int main(int argc, char **argv)
{
printf("%s\n", "Start running!");
f();
printf("%s\n", "End running!");
return 0;
}
void f()
{
char idstr[32]; char buff[128];
int numprocs; int myid; int i;
MPI_Status stat;
printf("Entering function f().\n");
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if(myid == 0)
{
printf("WE have %d processors\n", numprocs);
for(i=1;i<numprocs;i++)
{
sprintf(buff, "Hello %d", i);
MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }
for(i=1;i<numprocs;i++)
{
MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);
printf("%s\n", buff);
}
}
else
{
MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);
sprintf(idstr, " Processor %d ", myid);
strcat(buff, idstr);
strcat(buff, "reporting for duty\n");
MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
printf("Leaving function f().\n");
}
However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:
$ mpirun -np 3 ex2
Start running!
Entering function f().
Start running!
Entering function f().
Start running!
Entering function f().
WE have 3 processors
Hello 1 Processor 1 reporting for duty
Hello 2 Processor 2 reporting for duty
Leaving function f().
End running!
Leaving function f().
End runnin开发者_开发技巧g!
Leaving function f().
End running!
So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().
To answer your update:
When using MPI, the same program is run by each processor. In order to restrict parallel parts you will need to use a statement like:
if (rank == 0) { ...serial work... }
This will ensure that only one processor does the work inside this block.
You can see how this works in the example program you posted, inside f()
, there is the if(myid == 0)
statement. This block of statements will then only be executed by process 0, all other processes go straight to the else
and receive their messages, before sending them back.
With regard to MPI_Init
and MPI_Finalize
-- MPI_Init
initialises the MPI environment. Once you have called this method you can use the other MPI methods like Send
and Recv
. Once you have finished using MPI methods, MPI_Finalize
will free up the resources etc, but the program will keep running. For example, you could call MPI_Finalize
before performing some I/O that was going to take a long time. These methods do not delimit the parallel portion of the code, merely where you can use other MPI calls.
Hope this helps.
You just need to assign a portion of the arrays (a, b, c) to each process. Something like this:
#include <mpi.h>
#define N 1000
int main(int argc, char *argv[])
{
int i, myrank, myfirstindex, mylastindex, procnum;
float a[N], b[N], c[N];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &procnum);
MPI_Comm_rank(comm, &myrank);
/* Dynamic assignment of chunks,
* depending on number of processes
*/
if (myrank == 0)
myfirstindex = 0;
else if (myrank < N % procnum)
myfirstindex = myrank * (N / procnum + 1);
else
myfirstindex = N % procnum + myrank * (N / procnum);
if (myrank == procnum - 1)
mylastindex = N - 1;
else if (myrank < N % procnum)
mylastindex = myfirstindex + N / procnum + 1;
else
mylastindex = myfirstindex + N / procnum;
// Initializations
for(i = myfirstindex; i < mylastindex; i++)
a[i] = b[i] = i * 1.0;
// Computations
for(i = myfirstindex; i < mylastindex; i++)
c[i] = a[i] + b[i];
MPI_Finalize();
}
You can try to use proprietary Intel Cluster OpenMP. It will run OpenMP programs on cluster. Yes, it simulates shared memory computer on distributed memory clusters using the "Software Distributed shared memory" http://en.wikipedia.org/wiki/Distributed_shared_memory
It is easy to use and included in Intel C++ Compiler (9.1+). But it works only on 64-bit processors.
精彩评论