开发者

Confusion about file accesses in disco

I have a simple 2 node cluster (master on one, workers on bo开发者_StackOverflow中文版th). I tried using:

python disco/util/distrfiles.py bigtxt /etc/nodes > bigtxt.chunks

To distribute the files (which worked ok).

I expected this to mean that the processes would spawn and only operate on local data, but it seems that they are trying to access data on the other machine, at times.

Instead, I completely copied the data directory. Everything worked fine, until the reduce portion. I received the error:

CommError: Unable to access resource (http://host:8989/host/8b/sup@4f6:d2f6:34b3b/map-index.txt): 

It seems like the item is expected to be accessed directly via http. But I don't think this is happening correctly. Are files supposed to be passed back and forth by http? Must I have a distributed FS for multi-node MapReduce?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜