开发者

How to generate a random partition from an iterator in Python

Given the desired number of partitions, the partitions should be nearly equal in size. This question handles the problem for a list. They do not have the random property, but that is easily added. My problem is, that I have an iterator as input, so shuffle does not apply. The reason for that is开发者_StackOverflow社区 that I want to randomly partition the nodes of graph. The graph can be very large, so I am looking for a solution that does not just create an intermediate list.

My first idea was to use compress() with a random number function as selector. But that only works for two partitions.


You could just create k list. When you receive a value, pick a random integer x between 0 and k-1, and put that value into the x-th list.

On average each list will contain N/k elements, but with a standard deviation of √(N * 1/k * (1-1/k)).

def random_partition(k, iterable):
  results = [[] for i in range(k)]
  for value in iterable:
    x = random.randrange(k)
    results[x].append(value)
  return results


You're just dealing to various partitions, right?

def dealer( iterator, size ):
    for item in iterator
        yield random.randrange( size ), item

Won't that get you started by assigning each item to a partition?

Then you can do something like this to make lists. Maybe not a good thing, but it shows how to use the function.

def make_lists( iterator, size ):
    the_lists = []*size
    for partition, item in dealer( iterator, size ):
        the_lists[partition].append(item)
    return the_lists


You can make the length of lists more uniform by adjusting the weights depending on the number of nodes generated so far in each partition. They'll be roughly equal-length if you pick a function so that the weight is 0 when (number of nodes in partition n) > (number of nodes)/(number of partions), i.e.

weight[i] = max(numNodes/numPartitions - nodesSoFar[i],0)

(The max() is to stop negative weights, which might happen if you have 4 nodes and 3 partitions.)

Then pick a random number from 1 to sum(weights) (or 0 to sum(weights)-1) and pick the partition appropriately.

compress() works provided you use a different selector per partition; something like (x == n for x in random_partition_numbers) where random_partition_numbers is a generator. You'll need to copy random_partition_numbers for each partition, of course. This design is inherently slower, since it needs to iterate through the list of nodes for each partition.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜