Find smallest subset sum matching another subset sum
I have a real-world problem (not homework!) that requires finding the sum of a subset of set A that equals the sum of a subset of some other set B.
A very similar question with a helpful answer is here.
Consider this example:
@a = qw(200 2000 2000 2000 4000);
@b = qw(528 565 800 1435 2000 2000 2872);
Using the code provided in the accepted answer to that question, I get the following output:
sum(200 2000 4000) = sum(528 800 2000 2872)
sum(200 4000 4000) = sum(528 800 2000 2000 2872)
sum(200 4000) = sum(528 800 2872)
sum(200 2000 2000 2000 4000) = sum(528 800 2000 2000 2000 2872)
For my purposes, I only want the answer(s) that use the fewest elements of the input sets. In this example, I just want
sum(200 4000) = sum(528 800 2872)
because all of the other answers also have 200
and 4000
in the sum. That is, I'm looking for just the "simplest" possible sums (in the sense that they use the fewest elements). Can someone suggest a reasonably efficient way of doing this? (Brute force is okay as my arrays aren't th开发者_运维技巧at large.)
Also, I should note that the second line of the output, sum(200 4000 4000) ...
, isn't correct because 4000
only appears once in @a
. I'm afraid I don't understand the algorithm well enough to see why this is happening.
Any help with either of these would be much appreciated!
This problem is NP-complete, so unless P=NP you are stuck doing exponential work on the size of the input. Now the neat thing about this type of problem is that there are actually two ways to solve which will put the exponent on different aspects of the problem.
First if your sums don't have too many elements you can just brute force this problem by searching over all combinations. This method is exponential on the number of elements in the sets, and will work reasonably well up to say 20 or so elements per container. After that it is going to get pretty nasty.
A second option is to use dynamic programming. Unlike the previous method, this algorithm is exponential in the number of bits that it takes to write out each of the numbers. What you do is keep track of the set of all possible sums as well as the smallest number of elements required to generate them. This is a simple modification of the solution that you linked to in your answer.
Here is some python code that generates all the possible values they can intersect at:
def gen_sum(a):
r = { 0: [] }
for x in a:
for y, v in r.items():
if not (x+y) in r or len(r[x+y]) > len(v) + 1:
r[x+y] = v + [x]
return r
def gen_sums(a, b):
asum = gen_sum(a)
bsum = gen_sum(b)
return [ (k, v, bsum[k]) for k,v in asum.items() if k in bsum ][1:]
Running it on your sample data, I got:
[(4000, [4000], [2000, 2000]), (6000, [2000, 4000], [565, 1435, 2000, 2000]), (2000, [2000], [2000]), (4200, [200, 4000], [528, 800, 2872]), (10200, [200, 2000, 2000, 2000, 4000], [528, 565, 800, 1435, 2000, 2000, 2872]), (8200, [200, 2000, 2000, 4000], [528, 800, 2000, 2000, 2872]), (6200, [200, 2000, 4000], [528, 800, 2000, 2872])]
EDIT: Fixed a typo, and also just noticed that holy crap tons of people have already answered this really quick.
Here's an updated algorithm which gives all the sums:
my @a = qw(200 2000 2000 2000 4000);
my @b = qw(528 565 800 1435 2000 2000 2872);
my %a = sums( @a );
my %b = sums( @b );
for my $m ( keys %a ) {
print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n" if exists $b{$m};
}
sub sums {
my( @a ) = @_;
return () unless @a;
my %a;
while( @a ) {
$a = shift @a;
for my $m ( keys %a ) {
$a{$m+$a} = [@{$a{$m}},$a];
}
$a{$a} = [$a];
}
return %a;
}
Al you have to do is find the shortest one(s), but others have covered that already :)
HTH,
Paul
You need to modify the recurrence relation, not just the output. Consider {1,2,20,23,42}
and {45}
. The original algorithm will output {1,2,42},{45} instead of {20,23},{45}. This is because 42 is considered last, and when it sums to 45, it will overwrite the bucket value at 45 previously containing {20,23}
Instead of keeping [set,sum] for each value, you need to keep [ minimum set, sum ] and then take the minimums at the end.
My perl is awful, but something like this
$a{$m+$a} = [min-length(@{$a{$m}},$a{$m+$a}[0]),$a];
where min-length returns the smaller set
I'm not great with perl. But,
for $m ( keys %a ) {
print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n" if exists $b{$m};
}
Modify this line to count the number of elements in set $a and $b for each $m. When you've finished looping through all of them, select the one with the least number of elements.
精彩评论