开发者

kick off a map reduce job from my java/mysql webapp

I need a bit of archecture advice. I have a java based webapp, with a JPA based ORM backed onto a mysql relational database. Now, as part of the application I have a batch job that compares thousands of database records with each other. This job has become too time consuming and needs to be parallelized. I'm looking at using mapreduce and hadoop in order to do this. However, I'm not too sure about how to integrate this into my current architecture. I think the easiest initial solution is to find a way to push data from mysql into hadoop jobs. I have done some initial research on this and found the following relevant information and possibilities:

1) https://issues.apache.org/jira/browse/HADOOP-2536 this gives an interesting overview of some inbuilt JDBC support 2) This article http://architects.dzone.com/articles/tools-moving-sql-database describes some third party too开发者_运维知识库ls to move data from mysql to hadoop.

To be honest I'm just starting out with learning about hbase and hadoop but I really don't know how to integrate this into my webapp.

Any advice is greatly appreciated. cheers, Brian


DataNucleus supports JPA persistence to HBase. Obviously JPA is designed for RDBMS so support for full JPA will never be possible, but you can do basic persistence/querying


Brian, In this case, you can either use HBase or Hive or just raw map-reduce jobs. 1. HBase is a column-oriented database. HBase best suits for a column based computations. For example, average employee salary(assuming salary is a column). And with it's powerful scalability feature, we can add nodes on the fly. 2. Hive is like traditional databases which supports SQL like queries. Internally queries will be converted into map-reduce problems. We can use this in case of row based computations. 3. Final option, where we can write our own map-reduce functionality. Using "sqoop", we can migrate data from relational databases to HDFS(Hadoop File System). Then we can write map-reduce problems that directly deal with underlying flat files. Mentioned some of the possible options. Let me know if you need additional details about above mentioned options.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜