Cluster points in PostGIS
I'm building an application that pulls lat/long values from a database and plots them on a Google Map. There could be thousands of data points so I "cluster" points close to each other so the user is not overwhelmed with icons. At the moment I perform this clustering in the application, with a simple algorithm like this:
- Get array of all points
- Pop first point off array
- Compare first point to all other points in array looking for ones that fall within x distance
- Create a cluster with the original and close points.
- Remove close points from array
- Repeat
Now I release this i开发者_高级运维s inefficient and is the reason I have been looking into GIS systems. I have set up PostGIS and have my lat & longs stored in a POINT geometry object.
Can someone get me started or point me to some resources on a simple implementation of this clustering algorithm in PostGIS?
I ended up using a combination of snaptogrid and avg. I realize there are algorithms out there (i.e. kmeans as Denis suggested) that will give me better clusters but for what I'm doing this is fast and accurate enough.
If it's enough to have stuff clustered in your browser, you could easily make use of OpenLayer's clustering capabilities. There are 3 examples that show clustering.
I've used it with a PostGIS database before, and as long as you don't have ridiculous amounts of data, it works pretty smooth.
- http://openlayers.org/dev/examples/strategy-cluster-extended.html
- http://openlayers.org/dev/examples/strategy-cluster-threshold.html
- http://openlayers.org/dev/examples/strategy-cluster.html
An example of clustering lonlat
points (of st_point
type) with PostGIS. The result set will contain (cluster_id, id) pairs. The number of clusters is the argument passed to ST_ClusterKMeans
.
WITH sparse_places AS (
SELECT
lonlat, id, COUNT(*) OVER() as count
FROM places
)
SELECT
sparse_places.id,
ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
FROM sparse_places;
We need the Common Table Expression with a COUNT
window function in order to make sure the number of clusters provided to ST_ClusterKMeans
never goes below the number of input rows.
精彩评论