How to use KDTree to make top-k query and range query on arbitrary dimensions
I have used KD-tree(libkdtree++) to store开发者_开发技巧 a multi-dimensional data set, and the requirements here is this data set can support top-k/range queries on different dimensions. For example, a KDTree<3, Point> tree: to find the top 100 points whose have highest Point[1](y axis) values.
From the implementation of libkdtree++, what's similar is the "find_within_range" functions, however it is counted based on "Manhattan distance", which equals max(x_dist, max(y_dist, z_dist)) here. How can I just use range query on one dimension?
Looking at the code, it looks like you can't do that in a straightforward way, ridiculously enough. If I were you I'd be tempted to either hack the library or write my own kd-tree. I'd ask on their mailing list to be sure, but it looks like you might have to do something like this:
kdtreetype::_Region_ r(point_with_min_y);
r.set_low_bound(min_x, 0);
r.set_high_bound(max_x, 0);
r.set_low_bound(min_z, 2);
r.set_high_bound(max_z, 2);
r.set_high_bound((min_y + max_y) / 2, 1);
double search_min = min_y, search_max = max_y;
// binary search to get 100 points
int c;
while (c = tree.count_within_range(r) != 100) {
if (c > 100) search_max = (search_min + search_max) / 2;
else search_min = (search_min + search_max) / 2;
r.set_high_bound((search_min + search_max) / 2);
}
tree.visit_within_range(r, process_min_y_point);
This is a horribly inefficient binary search for the Y at which count(points with y <= Y) == 100. I'm not familiar with the library, but that's the best I've got on a cursory inspection.
精彩评论