Ad Hoc Reports Hadoop
I want to allow people to put in simple text search terms, run a pig job (if that's best? it's what I know best) and output the res开发者_开发技巧ults (the tsv file results?) so I can show them in a web interface.
Is there anything that approaches this problem?
Anything known to link a few disjointed pieces of the flow I am going for, together?Thanks
Why don't you index the docs into Lucene or Solr? Then you can do text search in real-time. Hadoop is designed for batch oriented processes, which doesn't seem like what you want in this case.
Well, it depends on your project's requirements. Does it need low-latency, and how complex is the ad hoc search. Well I think hbase+pig might be a comprised solution. hbase can be used for search real-time search purpose (although its search function is not so powerful than RDBMS) and pig for batch_processing of large amount for data.
精彩评论