How to efficiently split a 2D space in cells such that each cell contains at most K points?

2023-02-21 12:13 问答作者：

I have a dataset of 2D points (~500k of them) on which I'd like to perform some kind of quadrat count analysis. The basics of quadrat count is to split your 2D space into a regular grid (each cell has size SxS) and count the number of points in each cell.

For some reason, I'd like to do a slight variation of that : instead of using a regular grid, I want to build the grid such that each cell contains at most K points.

What I did is the following: I start with the whole space, and divide it in 4 cells (by "cutting" each dimension in half). Then, I count the number of points in each cell.开发者_StackOverflow社区 For those that contain more than K points, I divide them again, etc., until I'm done.

I tried both recursive and iterative implementations of this simple algorithm, but none performed well when applied to the whole dataset. The main bottleneck is the counting part, obviously, so I was wondering what kind of datastructure would allow me to do this efficiently ?

(For now, I'm just using "conditional indexing" in Python : points = points[points[,1] > x1 and points[,1] <= x2 and points[,2] > y1 and points[,2] <= y2,])

Also, do you have maybe another idea on how I could build my grid ?

EDIT: In other words, what kind of data structure could I use to quickly count (and retrieve) the points that fall within a given rectangle ((x1, y1), (x2, y2))?

This isn't complete but it might point you in the right direction.

Instead of starting big and going small, start small and go big.

Divide your space into, say, 100x100 cells. Count the number in each cell (this is exactly O(n), you count each cell once.)

From there on out you don't need to count cells. You can create CellGroups to count what cells it has, and from there I would use an algorithm to combine cells into CellGroups.

You might consider an approach that takes two small cells to merge them and recalculates.

while(true) {
    take the smallest cellgroup
    compare it to each other cellgroup starting with the second smallest
    go up the list until you find two adjecent cell groups
    if you find a match
        merge them
        update the cellgroup size rankings
        repeat the process (continue the while(true)
    otherwise
        break out, you're done merging cells

}

I'm not familiar enough with Python, but if you run through entire array for each quadrant, it can be improved:

After each splitting group points according to quadrant they correspond to. When splitting further a quadrant analyse only corresponding subarray. This may speed up counting.

Also since you are OK with irregular grid, you may consider selecting separation lines always diving points into equal groups (horizontal and vertical splitting should be done separately for this).

继续阅读：algorithm data-structures language-agnostic

How to efficiently split a 2D space in cells such that each cell contains at most K points?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？