How to perform a multidimensional search for "N-nearest neighbors?"
I am designing an automated trading software for the foreign exchange market. In a MYSQL database I have years of market data at five-minute intervals. I have 5 different metrics for this data alongside the price and time.
[Time|Price|M1|M2|M3|M4|M5]
x ~400,0000
Time
is the primary key, and M1
through M5
are different metrics (such as standard deviation or slope of a moving average).
Given an input of M1
,M2
,M3
,M4
, and M5 how can I efficiently locate the nearest 5,000 neighbors? Note that each met开发者_如何学JAVAric is floating point and has different distributions/ranges.
I don't know how you would determine the nearest neighbor. It seems you could do an absolute value difference between each metric and sum them up. (Without the absolute value, you could have two metrics that are way off, but cancel each other out.)
So, the nearest neighbor would be defined as having the lowest value from this quest:
ABS(M1 - @M1) + ABS(M2 - @M2) + ABS(M3 - @M3) + ABS(M4 - @M4) + ABS(M5 - @M5)
If this works, then the query would be:
SELECT TOP 5000 *
FROM YourTable
ORDER BY ABS(M1 - @M1) + ABS(M2 - @M2) + ABS(M3 - @M3) + ABS(M4 - @M4) + ABS(M5 - @M5)
If you wanted, you could weight each metric differently as well:
SELECT TOP 5000 *
FROM YourTable
ORDER BY 2 * ABS(M1 - @M1) + 5 * ABS(M2 - @M2) + ABS(M3 - @M3) + 3 * ABS(M4 - @M4) + ABS(M5 - @M5)
精彩评论