KD tree, slow tree construction

2023-01-25 06:35 问答作者：

I am trying to build KD Tree (static case). We assume points are sorted on both x and y coordinates.

For even depth of recursion the set is split into two subsets with a vertical line going through median x coordinate.

For odd depth of recursion the set is split into two subsets with a horizontal line going through median y coordinate.

The median can be determined from sorted set according to x / y coordinate. This step I am doing before each splitting of the set. And I think that it causes the slow construction of the tree.

Please could you help me check any and optimize the code?
I can not find the k-th nearest neighbor, could somebody help me with the code?

Thank you very much for your help and patience...

Please see the sample code:

class KDNode
{
private:
Point2D *data;
KDNode *left;
KDNode *right;
    ....
};

void KDTree::createKDTree(Points2DList *pl)
{
//Create list
KDList kd_list;

//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
{
kd_list.push_back((*pl)[i]);
}

//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());

//Build KD Tree
root = buildKDTree(&kd_list, 1);
}


KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
{
//Build KD tree
const unsigned int n = kd_list->size();

 //No leaf will be built
 if (n == 0)
 {
  return NULL;
 }

 //Only one point: create leaf of KD Tree
 else if (n == 1)
 {
  //Create one leaft
  return new KDNode(new Point2D ((*kd_list)[0]));
 }

 //At least 2 points: create one leaf, split tree into left and right s开发者_StackOverflow社区ubtree
 else
 {
  //New KD node
  KDNode *node = NULL;

  //Get median index
  const unsigned int median_index = n/2;

  //Create new KD Lists
  KDList kd_list1, kd_list2;

  //The depth is even, process by x coordinate
  if (depth%2 == 0)
  {
   //Create new median node
   node = new KDNode(new Point2D( (*kd_list)[median_index]));

   //Split list
   for (unsigned int i = 0; i < n; i++)
   {
    //Geta actual point
    Point2D *p = &(*kd_list)[i];

    //Add point to the first list: x < median.x
    if (p->getX() < (*kd_list)[median_index].getX())
    {
     kd_list1.push_back(*p);
    }

    //Add point to the second list: x > median.x
    else if (p->getX() > (*kd_list)[median_index].getX())
    {
     kd_list2.push_back(*p);
    }
   }

   //Sort points by y for the next recursion step: slow construction of the tree???
   std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
   std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());

  }

  //The depth is odd, process by y coordinates
  else
  {

   //Create new median node
   node = new KDNode(new Point2D((*kd_list)[median_index]));

   //Split list
   for (unsigned int i = 0; i < n; i++)
   {
    //Geta actual point
    Point2D *p = &(*kd_list)[i];

    //Add point to the first list: y < median.y
    if (p->getY() < (*kd_list)[median_index].getY())
    {
     kd_list1.push_back(*p);
    }

    //Add point to the second list: y < median.y
    else if (p->getY() >(*kd_list)[median_index].getY())
    {
     kd_list2.push_back(*p);
    }
   }

   //Sort points by x for the next recursion step: slow construction of the tree???
   std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
   std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());

  }

  //Build left subtree
  node->setLeft( buildKDTree(&kd_list1, depth +1 ) );

  //Build right subtree
  node->setRight( buildKDTree(&kd_list2, depth + 1 ) );

  //Return new node 
  return node; 
 }
}

The sorting to find the median is probably the worst culprit here, since that is O(nlogn) while the problem is solvable in O(n) time. You should use nth_element instead: http://www.cplusplus.com/reference/algorithm/nth_element/. That'll find the median in linear time on average, after which you can split the vector in linear time.

Memory management in vector is also something that can take a lot of time, especially with large vectors, since every time the vector's size is doubled all the elements have to be moved. You can use the reserve method of vector to reserve exactly enough space for the vectors in the newly created nodes, so they need not increase dynamically as new stuff is added with push_back.

And if you absolutely need the best performance, you should use lower level code, doing away with vector and reserving plain arrays instead. Nth element or 'selection' algorithms are readily available and not too hard to write yourself: http://en.wikipedia.org/wiki/Selection_algorithm

Some hints on optimizing the kd-tree:

Use a linear time median finding algorithm, such as QuickSelect.
Avoid actually using "node" objects. You can store whole tree using the points only, with ZERO additional information. Essentially by just sorting an array of objects. The root node will then be in the middle. A rearrangement that puts the root first, then uses a heap layout will likely be nicer to the CPU memory cache on query time, but more tricky to build.

Not really an answer to your questions, but I would highly recommend the forum at ~~http://ompf.org/forum/~~ They have some great discussions over there for fast kd-tree constructions in various contexts. Perhaps you'll find some inspiration over there.

Edit:
The OMPF forums have since gone down, although a direct replacement is currently available at http://ompf2.com/

Your first culprit is sorting to find the median. This is almost always the bottleneck for K-d tree construction, and using more efficient algorithms here will really pay off.

However, you're also constructing a pair of variable-sized vectors each time you split and transferring elements to them.

Here I recommend the good ol' singly-linked list. The beauty of the linked list is that you can transfer elements from parent to child by simply changing next pointers to point at the child's root pointer instead of the parent's.

That means no heap overhead whatsoever during construction to transfer elements from parent nodes to child nodes, only to aggregate the initial list of elements to insert to the root. That should do wonders as well, but if you want even faster, you can use a fixed allocator to efficiently allocate nodes for the linked list (as well as for the tree) and with better contiguity/cache hits.

Last but not least, if you're involved in intensive computing tasks that call for K-d trees, you need a profiler. Measure your code and you'll see exactly what lies at the culprit, and with exact time distributions.

继续阅读：kdtree nearest-neighbor

KD tree, slow tree construction

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？