Questions regarding a optimizing a slow query (SQL included)
SELECT DISTINCT "myapp_profile"."us开发者_如何学Goer_id", "myapp_profile"."name",
"myapp_profile"."age", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined"
FROM "myapp_profile"
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id")
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id")
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id")
WHERE ("auth_user"."is_superuser" = false
AND "auth_user"."is_staff" = false
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16)))
ORDER BY "myapp_profile"."user_id"
DESC LIMIT 100
The above query takes about 100 seconds to run with 2 million users/profiles. I'm no DBA and our DBAs are looking at the situation to see what can be done, but since I'll likely never get to see what changes (assuming it happens at the DB level), I'm curious how you could optimized this query. It obviously needs to happen a ton faster than it is happening, like on the order of 5 seconds or less. If there is no way to optimize the SQL, is there an index or indexes you could add/change to make the query it quicker, or is there anything something else I'm overlooking?
Postgres 9 is the DB, and Django's ORM is where this query came from.
Query Plan
Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))
Thanks
I'm not so familiar with postgres, so I'm not sure how good it's query optimiser is, but it looks like everything you have in the where clause could instead be join conditions, although I'd hope postgres is clever enough to work that out for itself, however if it's not then it's going to fetch all your 2 million users with related records in the other 3 tables and then filter that using your where.
The indexes already mentioned should also work for you if they don't already exist. Again i'm more an MSSQL person but does postgres not have any statistics profile or query plan you can see?
Something along these lines
SELECT DISTINCT
"myapp_profile"."user_id",
"myapp_profile"."name",
"myapp_profile"."age",
"auth_user"."id",
"auth_user"."username",
"auth_user"."first_name",
"auth_user"."last_name",
"auth_user"."email",
"auth_user"."password",
"auth_user"."is_staff",
"auth_user"."is_active",
"auth_user"."is_superuser",
"auth_user"."last_login",
"auth_user"."date_joined"
FROM "myapp_profile"
INNER JOIN "auth_user"
ON ("myapp_profile"."user_id" = "auth_user"."id")
AND "auth_user"."is_superuser" = false
AND "auth_user"."is_staff" = false
LEFT OUTER JOIN "myapp_siterel"
ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id")
LEFT OUTER JOIN "django_site"
ON ("myapp_siterel"."site_id" = "django_site"."id")
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
ORDER BY "myapp_profile"."user_id" DESC
LIMIT 100
Also, do you need the distinct? That'll also slow it down somewhat.
for basics:
make sure all the user id fields are indexed.
also looks like you would do well with an index on is_supervisor, and is_staff
there's never a straight forward silver-bullet solution for query optimization, however, the obvious steps is to index columns you're searching on, in your case, that's:
"auth_user"."is_superuser"
"auth_user"."is_staff"
"django_site"."id"
"myapp_profile"."user_id"
精彩评论