Optimization problem - vector mapping
A
and B
are sets of N
dimensional vectors (N=10
), |B|>=|A|
(|A|=10^2
, |B|=10^5
). Similarity measure sim(a,b) is dot product (required). The task is following: for each vector a
in A
find vector b
in B
, such that sum of similarities ss
of all pairs i开发者_如何学Pythons maximal.
My first attempt was greedy algorithm:
- find the pair with the highest similarity and remove that pair from A,B
- repeat (1) until A is empty
But such greedy algorithm is suboptimal in this case:
a_1=[1, 0] a_2=[.5, .4] b_1=[1, 1] b_2=[.9, 0] sim(a_1,b_1)=1 sim(a_1,b_2)=.9 sim(a_2,b_1)=.9 sim(a_2, b_2)=.45
Algorithm returns [a_1,b_1]
and [a_2, b_2]
, ss=1.45
, but optimal solution yields ss=1.8
.
Is there efficient algo to solve this problem? Thanks
This is essentially a matching problem in weighted bipartite graph. Just assume that weight function f
is a dot product (|ab|
).
I don't think the special structure of your weight function will simplify problem a lot, so you're pretty much down to finding a maximum matching.
You can find some basic algorithms for this problem in this wikipedia article. Although at first glance they don't seem viable for your data (V = 10^5
, E = 10^7
), I would still research them: some of them might allow you to take advantage of your 'lame' set of vertixes, with one part orders of magnitude smaller than the other.
This article also seems relevant, although doesn't list any algorithms.
Not exactly a solution, but hope it helps.
I second Nikita here, it is an assignment (or matching) problem. I'm not sure this is computationally feasible for your problem, but you could use the Hungarian algorithm, also known as Munkres' assignment algorithm, where the cost of assignment (i,j)
is the negative of the dot product of ai
and bj
. Unless you happen to know how the elements of A and B are formed, I think this is the most efficient known algorithm for your problem.
精彩评论