Hadoop_开发者

开发者

Hadoop

相关标签：Mysql sql c django mongodb

Getting started with massive data
I\'m a mathematician and occasionally do some statistics/machine learning analysis consulting projects on the side开发者_开发问答. The data I have access to are usually on the smaller side, at most a
问答阅读(4)
What's the best way to count unique visitors with Hadoop?
hey all, just getting started on hadoop and curious what the best way in mapreduce would be to count unique visitors if your logfiles looked like this...
问答阅读(2)
Managing dependencies with Hadoop Streaming?
I have a quick Hadoop Streaming question. If I\'m using Python streaming and I have Python packages that my mappers/reducers require but aren\'t installed by default do I need to install those on all
问答阅读(9)
Converting python collaborative filtering code to use Map Reduce
Using Python, I\'m computing cosine similarity across items. given event data that represents a purchase (user,item), I have a list of all items \'bought\' by my users.
问答阅读(3)
Global variables in hadoop
My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks
问答阅读(5)
How to use MapReduce in Hadoop?
Why do we use MapReduce? and what a开发者_如何学编程re some use cases?The classic example is counting the occurrence of words in a very large collection of documents.You can use the map step to genera
问答阅读(5)
Java or Python distributed compute job (on a student budget)?
I have a large dataset (c. 40G) that I want to use for some NLP (largely embarrassing开发者_C百科ly parallel) over a couple of computers in the lab, to which i do not have root access, and only 1G of
问答阅读(3)
How does Hadoop perform input splits?
This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where
问答阅读(2)
Efficient way to store a graph for calculation in Hadoop
I am currently trying to perform calculations like clustering coefficient on huge graphs with the help of Hadoop. Therefore I need an efficient way to store the graph in a way that I can easily access
问答阅读(2)
Make clients to download InstallShield PreRequisites from Internet
My installshield project uses custom prerequisites to install .Net Framework 4.0 Client Profile and开发者_C百科 Microsoft Sync Framework 2.0 client package.
问答阅读(7)

首页上一页第56页下一页共67页