Dimension Reduction with Map reduce, using distributed computing?
Do you know an application or algorithm to reduce dimensionality of big data, maybe using Map-Reduce
, or other ap开发者_如何学Pythoni
, also:
Do you know some algorithms like
Singular Value decomposition
than can be useful to reduce dimention of data setshow to use distributed computing to solve this???
Have a look at Mahout because SVD is implemented in there.
Besides Mahout, you should take a look at SLEPc (which is a toolkit based on PETSc) for solving eigenvalue problems for very large sparse matrices. It uses MPI, so it will run on lots of different parallel and distributed architectures. There's also Gensim, written in Python. It's probably not as scalable as either Mahout or SLEPc but it's much easier to use.
精彩评论