Algorithm question: best matching subsets
I am building a program to match trades. Below is a decription of the problem I am currently faced with. I need some help with algorithms.
Given two sets of trades A and B with similar attributes(trade date, account, symbol) I need to find the subset of trades a within A and b within B where sum(a) is closest to sum(b). Here sum() is the sum of a specific attribute (net money) for that subset. The开发者_如何学C reason for needing the closest match is that if we do not get a perfect match (ideal case) we want the next closest. note: sum(a) can be greater or less than sum(b).
I obviously want to do this without using the brute force approach of generating all combinations of A and B and comparing.
I feel this can be done with some dynamic programming method but am unable to come up with anything concrete. Would appreciate any help.
This problem is NP-hard.
The proof is a reduction from subset-sum, which is known to be NP-hard. Given any instance of subset sum in which we are given a set S of elements to sum up and some target number k, we can construct an instance of your problem by letting A be the set S and letting B be the singleton set {k}. If we solve your problem and find that the closest match doesn't exactly total k, then we know that there is no way to sum up a subset of S to get k. Otherwise, if there is a way to sum up elements of S to k, then the match will perfectly equal k and we know that some subset does add up to the target.
Because this problem is NP-hard, you should't expect a solution that runs in polynomial time or that does much better than brute force. I think you're going to need to relax the problem a bit in order to get good results.
Ouch, this sounds like subset-sum on steroids. It would be nice to have an idea of your problem size (number of elements of A and B). The problem will certainly be NP-hard, so you may not be able to use an exact solution such as the one below.
One simple DP solution is to solve subset-sum for A and B separately, for every possible sum value. So if each set has up to 10 elements that can be between 0 and 50, then, for both A and B, use DP to answer the question "Is there a subset that sums to X" for X between 0 and 500. Then just go through the two sets and see which values they have in common, or find the minimum distance from some possible sum in A to some possible sum in B.
(Note: I said 'simple', not 'efficient'! But there is no solution that is substantially faster than this in big-O terms, due to NP-hardness etc.)
So for the brute force algorithm, we build the superset of A and B, and build all combinations of them, sum them, build the absolute of the sum and find the minimum?
sa = superset (A) // () (a) (e) (i) (a, e), (a, i) (e, i) (a, e, i)
sb = superset (B)
sas = supersetAsums // 0, a, e, i, a+e, a+i, e+i, a+e+i
sbs = supersetAsums
ssas = sorted (sas)
ssbs = sorted (sbs)
Now you can iterate through both list, if ssas(i) < ssbs(j) you increment j, else increment i, and look whether the abs(diff) is smaller than current min(abs(diff)).
The problem here is the building of subsets, which get really large fast for most known samples of N. :)
精彩评论