开发者

Optimal Sharing of heavy computation job using Snow and/or multicore

I have the following problem.

First my environment, I have two 24-CPU servers to work with and one big job (resampling a large dataset) to share among them. I've setup multicore and (a socket) Snow cluster on each. As a high-level interface I'm using foreach.

What is the optimal sharing of the job? Should I setup a Snow cluster using CPUs from both machines and split the job that way (i.e. use doSNOW for the foreach loop). Or should I use the two servers separately and use multicore on each server (i.e. split the job in two chunks, run them on each server and then stich it back together).

Basically what is an easy way to: 1. Keep communication between servers down (since this is probably the slowest bit). 2. Ensure that the random number开发者_如何学Pythons generated in the servers are not highly correlated.


Snow and multicore varies in one significant way -- multicore forks a new process, so it is using the same memory as the main process. This means that if you use snow you need to distribute (physically send and store in children' space) the data you want to process, but if you use multicore children will be just able to access the main process's copy of the data -- so it saves transfer and memory use.


Don't have enough experience to answer (1). But the way to avoid (2) is to use a random number generator meant for parallel programs: look at the rlecuyer package and the clusterSetupRNG function in snow.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜