How to generate cross product of sets in specific order
Given some sets (or lists) of numbers, I would like to iterate through the cross product of these sets in the order determined by the sum of the returned numbers. For example, if the given sets are { 1,2,3 }, { 2,4 }, { 5 }, then I would like to retrieve the cross-products in the order
<3,4,5>, <2,4,5>, <3,2,5> or <1,4,5>, <2,2,5>, <1,2,5>
I can't compute all the cross-products first and then sort them, because there are way too many. Is there any clever way to achieve this with an iterator?
(I'm using Perl for this, in case th开发者_运维百科ere are modules that would help.)
For two sets A and B, we can use a min heap as follows.
- Sort A.
- Sort B.
- Push (0, 0) into a min heap H with priority function (i, j) |-> A[i] + B[j]. Break ties preferring small i and j.
- While H is not empty, pop (i, j), output (A[i], B[j]), insert (i + 1, j) and (i, j + 1) if they exist and don't already belong to H.
For more than two sets, use the naive algorithm and sort to get down to two sets. In the best case (which happens when each set is relatively small), this requires storage for O(√#tuples) tuples instead of Ω(#tuples).
Here's some Python to do this. It should transliterate reasonably straightforwardly to Perl. You'll need a heap library from CPAN and to convert my tuples to strings so that they can be keys in a Perl hash. The set can be stored as a hash as well.
from heapq import heappop, heappush
def largest_to_smallest(lists):
"""
>>> print list(largest_to_smallest([[1, 2, 3], [2, 4], [5]]))
[(3, 4, 5), (2, 4, 5), (3, 2, 5), (1, 4, 5), (2, 2, 5), (1, 2, 5)]
"""
for lst in lists:
lst.sort(reverse=True)
num_lists = len(lists)
index_tuples_in_heap = set()
min_heap = []
def insert(index_tuple):
if index_tuple in index_tuples_in_heap:
return
index_tuples_in_heap.add(index_tuple)
minus_sum = 0 # compute -sum because it's a min heap, not a max heap
for i in xrange(num_lists): # 0, ..., num_lists - 1
if index_tuple[i] >= len(lists[i]):
return
minus_sum -= lists[i][index_tuple[i]]
heappush(min_heap, (minus_sum, index_tuple))
insert((0,) * num_lists)
while min_heap:
minus_sum, index_tuple = heappop(min_heap)
elements = []
for i in xrange(num_lists):
elements.append(lists[i][index_tuple[i]])
yield tuple(elements) # this is where the tuple is returned
for i in xrange(num_lists):
neighbor = []
for j in xrange(num_lists):
if i == j:
neighbor.append(index_tuple[j] + 1)
else:
neighbor.append(index_tuple[j])
insert(tuple(neighbor))
精彩评论