How to generate a random partition from an iterator in Python
Given the desired number of partitions, the partitions should be nearly equal in size. This question handles the problem for a list. They do not have the random property, but that is easily added. My problem is, that I have an iterator as input, so shuffle
does not apply. The reason for that is开发者_StackOverflow社区 that I want to randomly partition the nodes of graph. The graph can be very large, so I am looking for a solution that does not just create an intermediate list.
My first idea was to use compress()
with a random number function as selector. But that only works for two partitions.
You could just create k list. When you receive a value, pick a random integer x between 0 and k-1, and put that value into the x-th list.
On average each list will contain N/k elements, but with a standard deviation of √(N * 1/k * (1-1/k)).
def random_partition(k, iterable):
results = [[] for i in range(k)]
for value in iterable:
x = random.randrange(k)
results[x].append(value)
return results
You're just dealing to various partitions, right?
def dealer( iterator, size ):
for item in iterator
yield random.randrange( size ), item
Won't that get you started by assigning each item to a partition?
Then you can do something like this to make lists. Maybe not a good thing, but it shows how to use the function.
def make_lists( iterator, size ):
the_lists = []*size
for partition, item in dealer( iterator, size ):
the_lists[partition].append(item)
return the_lists
You can make the length of lists more uniform by adjusting the weights depending on the number of nodes generated so far in each partition. They'll be roughly equal-length if you pick a function so that the weight is 0 when (number of nodes in partition n) > (number of nodes)/(number of partions), i.e.
weight[i] = max(numNodes/numPartitions - nodesSoFar[i],0)
(The max() is to stop negative weights, which might happen if you have 4 nodes and 3 partitions.)
Then pick a random number from 1 to sum(weights) (or 0 to sum(weights)-1) and pick the partition appropriately.
compress()
works provided you use a different selector per partition; something like (x == n for x in random_partition_numbers)
where random_partition_numbers is a generator. You'll need to copy random_partition_numbers for each partition, of course. This design is inherently slower, since it needs to iterate through the list of nodes for each partition.
精彩评论