开发者

How search engine, say Google's page ranking algorithm work across distributed/multiple machines?

I am new to distributed computing but was wondering how page ranking algorithm works across multiple machines. Like

  1. When do they decide data should be replicated (if needed at all),

  2. If data is not copied, do they ask serves at other places to give them the result?

  3. Or do they send "modules" to different serves (say part of a HUGE-HUGE - linked-graph) to one server, another module to another server and the combine the results they received?

  4. I search something -- how does it fetches pages from my country (you know, search pages from <insert country> only)

This is not homework. Just a question I had. I welcome all ideas, even if they are very general or very detailed or do not answer all of my questions.

Right now, I know next to nothing, my hope is to 开发者_Python百科know something after going through the answers.


There're three whales: MapReduce, Google File System, BigTable


Here are some whitepapers of the architecture

  • GoogleCluster
  • MapReduce, GFS, BigTable

Note: some of these are quite outdated, nowadays they are doing live updates, which wouldn't work with mapreduce.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜