开发者

Alternatives to Hadoop / Map-reduce framework for win32 platform

I'm finding Hadoop on Windows somewhat frustrating: I want to know if there are any serious alternatives to Hadoop for Win32 users. The features I most value are:

  • Ease of initial setup & deployment on a smallish network (I'd be astonished if we ever got more than 20 worker-PCs assigned to this project)
  • Ease of management - the ideal framework should have web/GUI based administration system so that I do not have to write one myself.
  • Something popular & stable. Bonuses depend on us getting this project delivered in time.

BACKGROUND:

The company I work for wants to build a new grid system to run some financial calculations.

The first framework I have been evaluating is Hadoop. This seemed to do exactly what was intended except that it's very UNIX oriented. I was able to get all of the tutorials up & running on an Ubuntu VirtualBox. Unfortunately nothing seems to run easily on Win32.

Yes... Win32: Our company has a policy that everything has to run on Windows. None of the server admins (or anybody outside of select few developers) know anything about Linux. I'd probably get in trouble if they found my virtual Ubuntu environment! The sad fact is that our grid needs to be hosted on Win32 (since all the test PCs run Windows XP 32bit), with an option to upgrade to Win64 at sometime in the future.

To complicate matters - 95开发者_开发问答% of what we want to run are Python scripts with C++ Windows 32bit DLL add ons. Our calculation library is overwhelmingly written in Python. Our calculation libraries will not run on anything other than Windows... I do not really have a choice


For python there is:

  • disco
  • bigtempo
  • celery - not really a map-reduce framework, but it's a good start if you want something very customized

And you can find a bunch of hadoop clients/integrations on pypi


You could try MPI. It is a standard for message-passing concurrent applications. We are running it on our Linux cluster but it is cross-platform. The most popular implementation is mpich2, written in C. There are python bindings for MPI through the mpi4py library.


IPython has some parallel computing features that are simple and work on windows. It may be enough for your needs. Here's a good place to start:

http://showmedo.com/videotutorials/video?name=7200100&fromSeriesID=720


I've compiled a list of available MapReduce/Hadoop offerings in the cloud (hosted services, PaaS-level), this might be of help as well.


Many distributed computing frameworks can be used for many-task computing. If you don't need the MapReduce paradigm, but rather the ability to distribute the tasks of a job across separate computers, communication and resource management, then you could take a look at other platforms in this area like Condor, or even Boinc; both run on Windows.

You could also run Hadoop on Linux virtual machines.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜