How search engine, say Google's page ranking algorithm work across distributed/multiple machines?
I am new to distributed computing but was wondering how page ranking algorithm works across multiple machines. Like
When do they decide data should be replicated (if needed at all),
If data is not copied, do they ask serves at other places to give them the result?
Or do they send "modules" to different serves (say part of a HUGE-HUGE - linked-graph) to one server, another module to another server and the combine the results they received?
I search something -- how does it fetches pages from my country (you know, search pages from
<insert country>
only)
This is not homework. Just a question I had. I welcome all ideas, even if they are very general or very detailed or do not answer all of my questions.
Right now, I know next to nothing, my hope is to 开发者_Python百科know something after going through the answers.
There're three whales: MapReduce, Google File System, BigTable
Here are some whitepapers of the architecture
- GoogleCluster
- MapReduce, GFS, BigTable
Note: some of these are quite outdated, nowadays they are doing live updates, which wouldn't work with mapreduce.
精彩评论