How to find a pair (or trio or more) of objects that TOGETHER are most similar to a given criteria?

2022-12-07 18:31 问答作者：

The objects would consist of various data but we'd only be using the numeric data for the algorithm, which should simplify things a little bit.

I'll explain my problem with an example. Let's say we have a list of 200+ basketball players and their average game stats. Let's say I want to find a player with stats most similar to another player. That's doable in a reasonable amount of time.

What I'm struggling to figure out is how I can find two players that when combined have stats more similar to a certain single basketball player than all other pairs of basketball players possible. For example:

I'm looking for a pair of players that when combined have average stats of:

points: 21.1
rebo开发者_C百科unds: 11.1
assists: 17.3
shots made: 44%
etc.

The algorithm should return Player A and Player B that combined are closer to those average stats than any other two players.

Is this doable in a timely manner, especially considering there would be between 9 to 11 different numeric values to consider for each basketball player? And the list of basketball players would be at least 200. And on top of all that, is it possible to scale this problem to not only finding pairs of players, but trios or more?

I'm working on a personal project and would love some insight on how to tackle this issue. Thank you!

Here is a proposed algorithm.

Call M the ideal average that you want to get close to.

For every player P:

Find player Q = Q(P) such that (P+Q)/2 is closest to M;

Note d(P) the distance between (P+Q)/2 and M

Return pair (P, Q(P)) that minimises d(P)

This algorithm will be efficient, provided that given a player P, the operation "find player Q such that (P+Q)/2 is closest to M" is efficient.

Note that the two following operations are equivalent:

Find player Q such that (P+Q)/2 is closest to M;
Find player Q which is closest to M + (M - P).

Geometrically speaking, it is easy to understand this equivalence. The point R = M+(M-P) is the symmetric of P in the central symmetry around point M; in other words, R is the point such that M is exactly the middle of segment PR.

Now, given point R = M + (M - P), the operation "find player Q which is closest to R" can be done efficiently if the points are stored in a data structure specially designed to make this operation efficient. One such data structure is the k-d tree.

Here is a piece of code in python using scikit-learn's sklearn.neighbors.KDTree:

from sklearn.neighbors import KDTree
from numpy import argmin

# X should be numpy array with
#   one row per player
#   one column per feature
def find_bestavg_pair(X, m):
    tree = KDTree(X, leaf_size=5)    # KDTree makes it easy to find nearest neighbour of a point
    # REMOVE leaf_size=5 IF NUMBER OF PLAYERS IS LARGE
    distances, best_indices = tree.query(m + (m-X), k=1)
    i = argmin(distances)
    return i, best_indices[i].item()

Testing on random data:

from numpy.random import random

n_players = 100
n_features = 4

players = random((n_players, n_features))
m = random(n_features)

print('Players:')
print(players)
print('Ideal average:')
print('    ', m)

i_p, i_q = find_bestavg_pair(players, m)
p, q = players[i_p], players[i_q]

print()
print('Found best pair:')
print('    ', p)
print('    ', q)
print('Average:')
print('    ', (p+q)/2)
print('Distance from average to ideal average:')
print('    ', norm(m - (p+q)/2))

Output:

Players:
[[0.92374607 0.17773757 0.86935844 0.23778054]
 [0.66040649 0.24189426 0.28202681 0.29389967]
 [0.5913366  0.67835115 0.68202081 0.34955092]
 ...
 [0.89292432 0.25801491 0.4732747  0.39460182]
 [0.97629877 0.32180046 0.32901506 0.64657499]]
Ideal average:
     [0.8075199  0.41172765 0.94341118 0.87773564]

Found best pair:
     [0.9056168  0.72582579 0.97190164 0.77675535]
     [0.56901592 0.06824099 0.97213458 0.81913477]
Average:
     [0.73731636 0.39703339 0.97201811 0.79794506]
Distance from average to ideal average:
     0.11103762079607322

继续阅读：algorithm search

How to find a pair (or trio or more) of objects that TOGETHER are most similar to a given criteria?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？