开发者

Increase speed of JAVA program for Machine Learning

I am doing machine learning in java using GATE Learning. I have a huge data set of documents to learn from. While using netbeans, I was getting java heap space error. So I provided 1600MB in the -Xmx parameter. Now, I do not get the heap space error but it takes ample of time to run!! (runs for 90 mins and I had to stop the process since I lost my patience!).

I do not understand whether I should increase my RAM(currently 4GB) or upgrade my OS(currently XP SP3, I have heard vista and win 7 better utilize RAM and Processor) or upgrade my processor(currently Dual C开发者_StackOverflow社区ore E5500 2.80 GHz)?

Please throw some insight into what I can do to make this process run faster!

Thanks Rishabh


Before you can answer what will make it run faster, you have to find the bottleneck.

I'm not very familiar with Windows, but there is some sort of system load monitoring widget, IIRC.

What I would do is as follows:

  • Create some datasets of increasing sizes (more documents)
  • Run your program against those datasets
  • On each run, work out if the CPU maxes out, or if the memory maxes out and starts swapping, or if the whole thing is IO bound

Then fix the one that is causing the problem.

Just for context, it's not that unusual for ML algorithms to take a long time to run on large data sets. You can use the above approach to plot out the run time as the size of the input datasets increase, at least then you'll know if your program would have stopped in 100 minutes or 100 centuries.


Get a Profiler such as VisualVM or YourKit - start your programm - connect the Profiler to your running program - Find out, which methods and objects are your bottleneck - then at least you know where to start improving your program.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜