How to improve ESRI/ArcGIS database performance while maintaining normalization?
I work with databases containing spatial data. Most of these databases are in a proprietary format created by ESRI for use with their ArcGIS software. We store our data in a normalized data model within these geodatabases.
We have found that the performance of this database is quite slow when dealing with relationships (i.e. relating several thousand records to several thousand records can take several minutes).
Is there any way to improve performance without completely flattening/denormalizing the database or is this strictly lim开发者_Go百科ited by the database platform we are using?
There is only one way: measure. Try to obtain a query plan, and try to read it. Try to isolate a query from the logfile, edit it to an executable (non-parameterised) form, and submit it manually (in psql). Try to tune it, and see where it hurts.
Geometry joins can be costly in term of CPU, if many (big) polygons have to be joined, and if their bounding boxes have a big chance to overlap. In the extreme case, you'll have to do a preselection on other criteria (eg zipcode, if available) or maintain cache tables of matching records.
EDIT: BTW: do you have statistics and autovacuum? IIRC, ESRI is still tied to postgres-8.3-something, where these were not run by default.
UPDATE 2014-12-11
ESRI does not interfere with non-gis stuff. It is perfectly Ok to add PK/FK relations or additional indexes to your schema. The DBMS will pick them up if appropiate. And ESRI will ignore them. (ESRI only uses its own meta-catalogs, ignoring the system catalogs)
When I had to deal with spatial data, I tended to precalulate the values and store them. Yes that makes for a big table but it is much faster to query when you only do the complex calculation once on data entry. Data entry does take longer though. I was in a situation where all my spacial data came from a monthly load, so precalculating wasn't too bad.
精彩评论