开发者

Calculating a minimal tree for file transfer

I'm constructing a file sharing system which needs to transmit a single file to many peers. One single root peer has the entire file and needs to transfer it to all other peers. How can I best construct a tree of file transfers such that tota开发者_如何学Gol waiting time is minimised?

For example:

Calculating a minimal tree for file transfer

In this tree, we need to wait 4 times before the transfer is finished.

Calculating a minimal tree for file transfer

Calculating a minimal tree for file transfer

These two are better, since we only have to wait three times for the transfer to finish.


I believe the optimal algorithm (under the constraint of not sending and receiving simultaneously as Steve pointed out) would be to start by putting log2(n) nodes beneath the root. Then for every node, it should have the number of children its parent has minus the left-to-right ordering among its immediate siblings (ie not looking at its parent's siblings' children).


The answer is easier to arrive at if you imagine the 2nd and subsequent children of a given node are actually a chain of descendents of the first child node, so that e.g.

     O
    /|\
   / | \
  /  |  \
 O   O   O

is actually represented as

     O
    /
   /
  /
 O===O===O

(I'm representing "sibling-of" edges as ===.) In this new representation, each node except the root can have at most 2 children: one being its actual first child, and the other being its first sibling to the right. (The root can have no siblings so it will have at most one child.) For example, the second tree in the OP's post,

         O
        / \
       /   \
      /     \
     O       O
    / \
   /   \
  /     \
 O       O

can be represented in the "sibling-edge" representation as

         O
        /
       /
      /
     O=======O
    /
   /
  /
 O=======O

The trick is that transfers to both of a node's children will occur simultaneously. That lets us arrange the tree visually so that all nodes that are transferred to at the same time appear in the same vertical position. We now get:

0            O
            /
           /
          /
1        O
        / \\
       /   \\
      /     \\
2    O       O
      \\
       \\
        \\
3        O

To figure out the maximum number of nodes that can be transferred to by time t, we need a tree in the above layout whose bottommost (tth) row contains no gaps. Let's call this maximum number of nodes m(t). For t > 1, we can build such a tree by taking the maximum-size tree for t-1 and creating 2 children (= 1 child and 1 sibling in the original tree) for each rightmost node. Tabulating a few values of m(t) gives us:

t      Total        Rightmost
       nodes        nodes

0      1            1
1      1+1*1=2      1 (only multiply by 1 because the root can't have siblings)
2      2+1*2=4      2
3      4+2*2=8      4
4      8+4*2=16     8

Clearly, m(t) = 2^(t-1), which makes sense, because this optimal tree is just a complete binary tree with a little "stem" at the top for the root node.

From this it follows that if the number of nodes n is a power of 2, the optimal tree for n = 2^t nodes needs t+1 time, while if it's not, it will need 1 more unit of time than the next smaller power of 2. So the amount of time needed to send to n nodes is roundup(log2(n)) + 1.

To realise the actual communication tree, you can now convert the "sibling-of" edges back into edges between the left node's parent and the right node. This shows that for power-of-2 node counts, the number of edges leaving the root will be the same as the total time needed (roundup(log2(n)) + 1). The number of edges leaving the 1st child of any node is always one less than its parent, and the number of edges leaving each 2nd and subsequent child is always one less than its left sibling.


In reference to Steve Jessops comment; why not just use BitTorrent or the like? (i.e. a well-tested, well-deployed infrastructure built on a known good framework and which requires almost no work on your part?)

A BT client could easily be modified to accept some form of authentication if that's what's required.

Edit: user's answer is good; log2(n) is probably the optimal number of clients; assuming every client has the same download bandwidth.

In practice, however; the fastest is probably sending all you can to every client directly while your pipe allows, and then gravitate towards log2(n) as the maximum number of clients receiving at any given time. If your pipe is n times as wide as the maximum download speed of any single client, this is the fastest. (Edit: Assuming, of course, that you can send to multiple clients at the same time. But that's a given, right? ;)


Imagine a rabbit that spawns another rabbit. These two spawn another two rabbits, and those 4 spawn another 4 rabbits... Now, the tree of rabbits with edge representing the parent-offspring relationship has somewhat peculiar shape. Root node has degree D=log2(n). Root's offsprings (D of them) have degrees from 0 to D-1. In fact, every node with degree X has X subnodes with degrees from 0 to X.

You might say this is a "fractal tree", perfect in its "imbalanceness".

All this assuming that each node's upload speed is equal to each node's download speed. In real world this is not the case, so you actually need to reexamine (again) your initial problem :-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜