Algorithm question: best matching subsets

2023-02-18 20:57 问答作者：

I am building a program to match trades. Below is a decription of the problem I am currently faced with. I need some help with algorithms.

Given two sets of trades A and B with similar attributes(trade date, account, symbol) I need to find the subset of trades a within A and b within B where sum(a) is closest to sum(b). Here sum() is the sum of a specific attribute (net money) for that subset. The开发者_如何学C reason for needing the closest match is that if we do not get a perfect match (ideal case) we want the next closest. note: sum(a) can be greater or less than sum(b).

I obviously want to do this without using the brute force approach of generating all combinations of A and B and comparing.

I feel this can be done with some dynamic programming method but am unable to come up with anything concrete. Would appreciate any help.

This problem is NP-hard.

The proof is a reduction from subset-sum, which is known to be NP-hard. Given any instance of subset sum in which we are given a set S of elements to sum up and some target number k, we can construct an instance of your problem by letting A be the set S and letting B be the singleton set {k}. If we solve your problem and find that the closest match doesn't exactly total k, then we know that there is no way to sum up a subset of S to get k. Otherwise, if there is a way to sum up elements of S to k, then the match will perfectly equal k and we know that some subset does add up to the target.

Because this problem is NP-hard, you should't expect a solution that runs in polynomial time or that does much better than brute force. I think you're going to need to relax the problem a bit in order to get good results.

Ouch, this sounds like subset-sum on steroids. It would be nice to have an idea of your problem size (number of elements of A and B). The problem will certainly be NP-hard, so you may not be able to use an exact solution such as the one below.

One simple DP solution is to solve subset-sum for A and B separately, for every possible sum value. So if each set has up to 10 elements that can be between 0 and 50, then, for both A and B, use DP to answer the question "Is there a subset that sums to X" for X between 0 and 500. Then just go through the two sets and see which values they have in common, or find the minimum distance from some possible sum in A to some possible sum in B.

(Note: I said 'simple', not 'efficient'! But there is no solution that is substantially faster than this in big-O terms, due to NP-hardness etc.)

So for the brute force algorithm, we build the superset of A and B, and build all combinations of them, sum them, build the absolute of the sum and find the minimum?

sa = superset (A) // () (a) (e) (i) (a, e), (a, i) (e, i) (a, e, i)  
sb = superset (B)

sas = supersetAsums // 0, a, e, i, a+e, a+i, e+i, a+e+i
sbs = supersetAsums

ssas = sorted (sas) 
ssbs = sorted (sbs)

Now you can iterate through both list, if ssas(i) < ssbs(j) you increment j, else increment i, and look whether the abs(diff) is smaller than current min(abs(diff)).

The problem here is the building of subsets, which get really large fast for most known samples of N. :)

继续阅读：algorithm

Algorithm question: best matching subsets

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？