开发者

High volume SVM (machine learning) system

I working on a possible machine learning project that would be expected to do high speed computations for machine learning using SVM (support vector machines) and possibly some ANN.

I'm resonably comfortable working on matlab with these, but primarly in small datasets, just for experimentation. I'm wondering if this matlab based approach will scale? or should i be looking into something else? C++ / gpu based computing? java wrapping of the matlab code and pushing it onto app engine?

Incidentally, there seems to be a lot fo literature on GPUs, but not much on how useful they are on machine learning applications using matlab, & the cheapest CUDA enlabled GPU money can buy开发者_StackOverflow中文版? is it even worth the trouble?


I work on Pattern Recognition problems. Let me please to give you some advices if you plan to work effectively on SVM/ANN problems and if you realy don't have access to a computer cluster:

1) Don't use Matlab. Use Python and its large number of numerical libraries instead for Visualisation/Analysis of your computations.
2) Critical sections better to implement using C. You can integrate them then with your Python scripts very easy .
3) CUDA/GPU is not a solution if you mostly deal with non-polinomial time complexity problems which is typical in Machine Learning, so it brings no great speed-up; dot/matrix products are only a tiny part of SVM calculations - you still will have to deal with feature extractions and lists/objects processing, try instead to optimize your algorithms and devise effective algorithmic methods. If you need parallelism (e.g. for ANNs), use threads or processes.
4) Use GCC compiler to compile your C program - it will build the very fast executable code. To speed-up numerical computations you can try GCC optimization flags (e.g. Streaming SIMD Extensions)
5) Run your program on any modern CPU under Linux OS.

For realy good performance, use Linux clusters.


Both libsvm and SVM light have matlab interfaces. Besides, most learning tasks are trivially parallelizable, so take a look at matlab commands like parfor and the rest of the Parallel Computing Toolbox.


I would advice against using Matlab for anything beyond prototyping. When the project becomes more complex and extensive, proportion of your own code will grow versus functionality provided by matlab and toolboxes. The more developed the project becomes, the less you benefit from matlab and the more you need features, libraries and - more importanly - practices, processes and tools of general purpose languages.

Scaling of matlab solution is achieved by interfacing with non-matlab code, and I've seen matlab project turn into nothin more than a glue calling modules written in multi-purpose languages. Causing everyday pains for everyone involved.

If you are comfortable with Java, I'd recommend using it together with some good math library (at least, you can always interface MKL). Even with recent Matlab optimisations, MKL + JVM are much faster - scaling and maintainability are beyond comparison.

C++ with processor specific intrinsics can provide better performance, but at a price of development time and maintainability. Adding CUDA imporves performance further, but the amount of work and specific knowledge is hardly worth it. Certainly not if you don't have prior experience with GPU calcucations. As soon as you go beyond single processor, it's much more effective to add another CPU or two to system than to struggle with GPU calculations.


Nothing as of now will scale beyond a limit. libsvm has a tool for subset selection, to select a set of data points for training. forget about ANN, it will not generalize and there is no theory that helps to choose the number of hidden nodes, etc.. It has to be manually optimized a lot and can get trapped in local minima.. Go with SVM only


Here you can find some semiparametric approxiamtions that can work with a high volumen of data very fast:

http://www.dabi.temple.edu/budgetedsvm/

https://robedm.github.io/LIBIRWLS/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜