开发者

How to efficiently split a 2D space in cells such that each cell contains at most K points?

I have a dataset of 2D points (~500k of them) on which I'd like to perform some kind of quadrat count analysis. The basics of quadrat count is to split your 2D space into a regular grid (each cell has size SxS) and count the number of points in each cell.

For some reason, I'd like to do a slight variation of that : instead of using a regular grid, I want to build the grid such that each cell contains at most K points.

What I did is the following: I start with the whole space, and divide it in 4 cells (by "cutting" each dimension in half). Then, I count the number of points in each cell.开发者_StackOverflow社区 For those that contain more than K points, I divide them again, etc., until I'm done.

I tried both recursive and iterative implementations of this simple algorithm, but none performed well when applied to the whole dataset. The main bottleneck is the counting part, obviously, so I was wondering what kind of datastructure would allow me to do this efficiently ?

(For now, I'm just using "conditional indexing" in Python : points = points[points[,1] > x1 and points[,1] <= x2 and points[,2] > y1 and points[,2] <= y2,])

Also, do you have maybe another idea on how I could build my grid ?

EDIT: In other words, what kind of data structure could I use to quickly count (and retrieve) the points that fall within a given rectangle ((x1, y1), (x2, y2))?


This isn't complete but it might point you in the right direction.

Instead of starting big and going small, start small and go big.

Divide your space into, say, 100x100 cells. Count the number in each cell (this is exactly O(n), you count each cell once.)

From there on out you don't need to count cells. You can create CellGroups to count what cells it has, and from there I would use an algorithm to combine cells into CellGroups.

You might consider an approach that takes two small cells to merge them and recalculates.

while(true) {
    take the smallest cellgroup
    compare it to each other cellgroup starting with the second smallest
    go up the list until you find two adjecent cell groups
    if you find a match
        merge them
        update the cellgroup size rankings
        repeat the process (continue the while(true)
    otherwise
        break out, you're done merging cells

}


I'm not familiar enough with Python, but if you run through entire array for each quadrant, it can be improved:

After each splitting group points according to quadrant they correspond to. When splitting further a quadrant analyse only corresponding subarray. This may speed up counting.

Also since you are OK with irregular grid, you may consider selecting separation lines always diving points into equal groups (horizontal and vertical splitting should be done separately for this).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜