How can I optimize this postgresql query?
Below is a postgres query that seems to be taking far longer than I would expect. The field_instances table is indexed on both form_instance_id and field_id, and the form_instances table is indexed on workflow_state. So I thought it would be a fast query, but it takes forever. Can anybody help me interpret the query plan and what kinds of indexes to add to speed it up? Thanks.
explain analyze
select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state = 'DRqueued'
and form_instance_id = form_instances.id
and field_id = 'Book_EstimatedDueDate';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=8733.85..95692.90 rows=9277 width=29) (actual time=2550.000..15430.000 rows=11431 loops=1)
Hash Cond: (field_instances.form_instance_id = form_instances.id)
-> Bitmap Heap Scan on field_instances (cost=2681.11..89071.72 row开发者_如何转开发s=47567 width=25) (actual time=850.000..13690.000 rows=51726 loops=1)
Recheck Cond: ((field_id)::text = 'Book_EstimatedDueDate'::text)
-> Bitmap Index Scan on index_field_instances_on_field_id (cost=0.00..2669.22 rows=47567 width=0) (actual time=830.000..830.000 rows=51729 loops=1)
Index Cond: ((field_id)::text = 'Book_EstimatedDueDate'::text)
-> Hash (cost=5911.34..5911.34 rows=11312 width=8) (actual time=1590.000..1590.000 rows=11431 loops=1)
-> Bitmap Heap Scan on form_instances (cost=511.94..5911.34 rows=11312 width=8) (actual time=720.000..1570.000 rows=11431 loops=1)
Recheck Cond: ((workflow_state)::text = 'DRqueued'::text)
-> Bitmap Index Scan on index_form_instances_on_workflow_state (cost=0.00..509.11 rows=11312 width=0) (actual time=650.000..650.000 rows=11509 loops=1)
Index Cond: ((workflow_state)::text = 'DRqueued'::text)
Total runtime: 15430.000 ms
(12 rows)
When you say The field_instances table is indexed on both form_instance_id and field_id you mean that there are separate indexes on form_instance_id and field_id on that table?
Try dropping the index on form_instance_id
and put a concatenated index on (form_instance_id, field_id)
.
An index works by giving you a quick lookup that tells you where the rows are that match your index. It then has to fetch through those rows to do what you want. So you always want your index to be as specific as possible. If you put two indexes on the table, you'll have two different ways to do a lookup, but a query will usually only take advantage of one of them. If you put a concatenated index on the table, you'll be able to look up on the first field in the index, the first two fields, etc efficiently. (So a concatenated index on (a, b)
gives you fast lookups on a
, even faster lookups on both a
and b
, but doesn't help you look things up on b
)
Right now it is figuring out all possible things in form_instances
that have the right state. It separately figures out all of the field_instances
that have the right field id. It then does a hash join. For this makes a lookup hash from one result set, and scans the other for matches.
With my suggestion it should figure out all possible form_instances
of interest. It will then go to the index, and figure out all of the field_instances
that match on both the form instance and field id, and then it will find exactly the results of interest. Because the index is more specific, the database will have fewer rows of data to deal with to process your query.
http://explain.depesz.com is a fantastic online tool that helps you identify the hot spots visually. I pasted your results into the tool and got this analysis: http://explain.depesz.com/s/VIk
It's hard to tell anything specifically without seeing your tables and indexes, however.
Need to know the data you have in your table however just from looking at the sql and column names I would recommend
- do you really need an index on workflow_state assuming elements within it can't be very unique - this might not improve select but will insert or update...
- try making
field_id
check the first condition in your where statement
精彩评论