Algorithm to return all possible combinations of n objects taken from different bins
To make it more specific, I need an algorithm (recursive or not) that, given a integer n and a matrix as input, will return me all of the combinations that will have: 1) At least 1 object from each line 2) Will have n objects total
I feel I could solve this easier if I just tried all combinations and use the ones that have n objects and 1 from each line, but I believe that the algorithm can be a lot more efficient than that. I have also successfully coded an algorithm that will return all combinations of 1 object per line, but couldn't expand it to more. I've been coding in Python, but any language is fine. Extra points for consideration that python pass开发者_如何学运维es objects per reference. =)
Assume the matrix is squared. If anyone wants to know why, this is part of a more complex graph algorithm I'm trying to solve.
Thanks all!
Assume the matrix m
is a list of lists; here is a Racket function for it:
(define (combinations m n)
(cond
((and (zero? n) (null? m)) '(()))
((zero? n) '())
((null? m) '())
((null? (car m)) '())
(else
(append (combinations (cons (cdar m) (cdr m)) n)
(map (lambda (ls) (cons (caar m) ls))
(append
(combinations (cons (cdar m) (cdr m)) (sub1 n))
(combinations (cdr m) (sub1 n))))))))
Thanks for all the answers, they were close to what I was looking for. I managed to do it under Python (so I didn't check the results posted here), my problem was actually Python passing reference vs copy in function calls. I thought a shallow copy would have been enough, but apparently I needed a deep copy (haven't thought it through why shallow wasn't enough).
This is how I did it:
PS1: Graphs here are dictionaries of lists. n_edges is the number of edges to be picked from the graph
PS2: Size calculation here is pretty inefficient, haven't taken time to fix it yet.
PS3: In order to iterate orderly over a dictionary, I created two lists: a list-of-lists representation of the graph (lol) and an index array that matches it (lolindex).
PS4: Adapted to fit the question I posted, the real method I used has more problem specific code to it. Haven't tested the code in the way I put it here.
def pickEdges(n_edges, newgraph):
size = sum(len(v) for v in newgraph.itervalues())
if (size == n_edges):
print newgraph
return
for i in range(len(lol)):
for j in range(len(lol[i])):
tmpgraph = copy.deepcopy(newgraph)
if (lol[i][j] not in tmpgraph[lolindex[i]]):
tmpgraph[lolindex[i]].append(lol[i][j])
pickEdges(n_edges, tmpgraph)
Most likely you want to modify the task and list all combinations of a simple list? Below is a Javascript function that will do this for you:
function getWayStr(curr) {
var nextAbove = -1;
for (var i = curr + 1; i < waypoints.length; ++i) {
if (nextAbove == -1) {
nextAbove = i;
} else {
wayStr.push(waypoints[i]);
wayStr.push(waypoints[curr]);
}
}
if (nextAbove != -1) {
wayStr.push(waypoints[nextAbove]);
getWayStr(nextAbove);
wayStr.push(waypoints[curr]);
}
}
What a great question! To be consistent with terminology, I will refer to the lines in your matrix as "input bags" and the items in the input bags as "objects". A bag (or multiset) is a container that allow members to appear more than once but whose members do not have an inherent order (unlike lists).
We are looking for a function with the following signature:
set of valid combinations =
generate_combinations(list of input bags, number of objects in valid combination)
Since it is possible that the set of valid combinations exceeds the memory available to Python, generate_combinations
should return a generator rather than an explicit list.
A valid result must satisfy the following requirements:
- At least 1 object from each input bag
- Will have n objects total
I am assuming the following:
- The order of the objects in a result does not matter
- An input bag can contain duplicate objects
- Two input bags can contain duplicate objects (in the degenerate case, two input bags can be identical)
- A valid result pulls objects without replacement
Requirement #1 and Assumption #4 imply number of input bags <= n <= total number of objects in all bags
Data Structures
- Since input bags are allowed to contain duplicate values (per Assumption #2), we cannot use set/frozenset to store these. Python lists are suitable for this task. An alternative container could be collections.Counter, which has constant-time membership test and better spatial efficiency for lists with many duplicates.
- Each valid combination is also a bag
- Order does not matter for the list of input bags, so this could be generalized as a bag of input bags. For sanity, I'll leave it as is.
I will use Python's built-in itertools
module to generate combinations, which was introduced in v2.6. If you are running an older version of Python, use a recipe. For these code-examples, I have implicitly converted generators into lists for clarity.
>>> import itertools, collections
>>> input_bags = [Bag([1,2,2,3,5]), Bag([1,4,5,9]), Bag([12])]
>>> output_bag_size = 5
>>> combos = generate_combinations(input_bags, output_bag_size)
>>> combos.next() #an arbitrary example
Bag([1,1,2,4,12])
Realize that the problem as stated above can be reduced to one that is immediately solvable by Python's built-in itertools module, which generates combinations of sequences. The only modification we need to do is eliminate Requirement #1. To solve the reduced problems, combine the members of the bags into a single bag.
>>> all_objects = itertools.chain.from_iterable(input_bags)
>>> all_objects
generator that returns [1, 2, 2, 3, 5, 1, 4, 5, 9, 12]
>>> combos = itertools.combinations(all_objects, output_bag_size)
>>> combos
generator that returns [(1,2,2,3,5), (1,2,2,3,1), (1,2,2,3,4), ...]
To reinstate requirement #1, each valid combination (output bag) needs to contain 1 element from each input bag plus additional elements from any of the bags until the output bag size reaches the user-specified value. If the overlap between [1 element from each input bag] and [additional elements from any of the bags] is ignored, the solution is just the cartesian product of the possible combinations of the first and the possible combinations of the second.
To handle the overlap, remove the elements from [1 element from each input bag] from the [additional elements from any of the bags], and for-loop away.
>>> combos_with_one_element_from_each_bag = itertools.product(*input_bags)
>>> for base_combo in combos_with_one_element_from_each_bag:
>>> all_objects = list(itertools.chain.from_iterable(input_bags))
>>> for seen in base_combo:
>>> all_objects.remove(seen)
>>> all_objects_minus_base_combo = all_objects
>>> for remaining_combo in itertools.combinations(all_objects_minus_base_combo, output_bag_size-len(base_combo)):
>>> yield itertools.chain(base_combo, remaining_combo)
The remove operation is supported on lists but isn't supported on generators. To avoid storing all_objects in memory as a list, create a function that skips over the elements in base_combo.
>>> def remove_elements(iterable, elements_to_remove):
>>> elements_to_remove_copy = Bag(elements_to_remove) #create a soft copy
>>> for item in iterable:
>>> if item not in elements_to_remove_copy:
>>> yield item
>>> else:
>>> elements_to_remove_copy.remove(item)
An implementation of the Bag class might look like this:
>>> class Bag(collections.Counter):
>>> def __iter__(self):
>>> return self.elements()
>>> def remove(self, item):
>>> self[item] -= 1
>>> if self[item] <= 0:
>>> del self[item]
Complete code
import itertools, collections
class Bag(collections.Counter):
def __iter__(self):
return self.elements()
def remove(self, item):
self[item] -= 1
if self[item] <= 0:
del self[item]
def __repr__(self):
return 'Bag(%s)' % sorted(self.elements())
def remove_elements(iterable, elements_to_remove):
elements_to_remove_copy = Bag(elements_to_remove) #create a soft copy
for item in iterable:
if item not in elements_to_remove_copy:
yield item
else:
elements_to_remove_copy.remove(item)
def generate_combinations(input_bags, output_bag_size):
combos_with_one_element_from_each_bag = itertools.product(*input_bags)
for base_combo in combos_with_one_element_from_each_bag:
all_objects_minus_base_combo = remove_elements(itertools.chain.from_iterable(input_bags), base_combo)
for remaining_combo in itertools.combinations(all_objects_minus_base_combo, output_bag_size-len(base_combo)):
yield Bag(itertools.chain(base_combo, remaining_combo))
input_bags = [Bag([1,2,2,3,5]), Bag([1,4,5,9]), Bag([12])]
output_bag_size = 5
combos = generate_combinations(input_bags, output_bag_size)
list(combos)
Finish this off by adding in error-checking code (such as output_bag_size >= len(input_bags).
The benefits of this approach are: 1. Less code to maintain (namely itertools) 2. No recursion. Python performance degrades with call stack height, so minimizing recursion is beneficial. 3. Minimum memory consumption. This algorithm requires constant-space memory beyond what's consumed by the list of input bags.
精彩评论