开发者

How does DHT work?

I grabbed the basic idea about DHT from wiki:

Store Data:

In a DHT-network, every node is responsible for a specific range of key-space. To store a file in the DHT, first, hash the file's name to get the file's key; second, send a message put(key, file-content) to any node of the DHT, the message will be relayed to the node which is responsible for key and that node will store the pair (key, file-content).

Get Data:

When getting a file from DHT, first, hash the file's name to get the key; second send a message get(key) to any node, relay the message until...

Questions:

  1. To store a file, we can hash the file's name to get its key, but wiki says:

In the real world the key k could be a hash of a file's content rather than a hash of a file's name to provide content-addressable storage, so that renaming of the file does not prevent users from finding it.

Hash file's content? How am I supposed to know the file's content? If I've already know the file's content, then WHY would I search it in the DHT?

  1. According to the wiki, every participating node will spare some space to store files. So does it mean that, if I participate in a DHT, I have to spar开发者_C百科e 10G disk space to store those files whose key falls into the specific key-space I'm responsible for?

  2. If indeed I should spare some disk space to store those files, then how should I store those (key, file-content) on the disk? I mean, should the file be arranged into a B-tree or something on my disk?

  3. When a query happens, how does my computer respond? I assume, first, check the queried key, if in my key-space, then find the corresponding file on my disk. right?


A DHT is just an algorithm. At its base it provides distributed key-value PUT and GET operations. Similar to a normal Map or associative array found in many programming languages.

Due to the real-world limitations such as untrustworthy nodes, failure rates and so on actual DHT implementations don't provide an arbitrary-length PUT(<uint8[]>, <uint8[]>) operation.

Example:

The kademlia implementation for bittorrent for example provides the following interfaces:

  • PUT(uint8[20], uint16)
  • GET(uint8[20]) -> List<Pair<IP, uint16>> where the list only represents a randomly sampled subset of the actual data

As you can see it actually is a specialized asymmetric interface when compared to more generic associative arrays. The IP address is always derived from the PUT sender's source address, i.e. cannot be explicitly set. And the GET returns a list instead of a single value, so it implements a MultiMap or Map<List>, if you want to see it like that.

In bittorrent's case a hash is used as content descriptor, where peers which have the content announce themselves on the DHT. If someone wants the file(s) they look up IP/Port pairs on the DHT, then contact the peers via a separate protocol and then download the data.

But other uses for a DHT are also possible, i.e. they could store signed, structured data, tweet-like text snippets or whatever. It always depends on your applications' needs.

It's just a basic building block.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜