Just how much Java does one need to use Hadoop and Mahout effectively?
I'm a PHP developer. Let's just get that out of the way now. But Hadoop – and Mahout in particular – have piqued my interest. I'm ready to take the dive into Java in order to use them.
So from people experience enough to know, just how much Java will I need to be able to use these effectively? From what I've seen, programming mappers/reducers doesn't take all that much. But with Mahou开发者_如何学Pythont I'm not at all sure what I'm looking at when I look at the documentation.
Also, just how hard will it be to take data from my PHP application for processing in Java via Hadoop and Mahout? I can't imagine it'd be that difficult, but I'm not experienced enough to say.
It shouldn't be all that difficult to get data from PHP to Java for analysis using Mahout and Hadoop.
Even easier is to process using Mahout and Hadoop off-line in a batch mode and to store the data products in a file system or database. PHP can then read these data products as easy as falling off a log.
For real-time use, the recommendations part of Mahout supports a variety of web-service interfaces that make it pretty easy to access from PHP. Hitting the model evaluation part of Mahout would require a bit more programming.
Beginner level of Java is sufficient. You can always dug deep on adhoc need basis.
I just did the same thing, and it's been years I did anything Java related. What I did was the following:
- Started off with simple Hadoop streaming examples
- Try my own analysis with PHP streaming
- Started experimenting with Pig
- Start experimenting with using PHP streaming inside Pig
All without any Java!
For real-time recommendations you could also instantiate an instance of mahout in a java servlet class, then serve export that as a war to serve up on a tomcat server.
精彩评论