Optimize a query that is using multiple left joins on the same tables
I've come across a query that is taking "too long". The query has 50+ left joins between 10 or so tables. To give a brief overview of the database model, the tables joined are tables that store data for a particular data type (ex: date_fields, integer_fields, text_fields, etc.) and each has a column开发者_如何转开发 for the value, a "datafield" id, and a ticket id. The query is built programmatically based on an association table between a "ticket" and its "data fields".
The join statements look something like the following:
...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)
When using explain on the query shows the following:
1 SIMPLE t ref idx_dataset_id idx_dataset_id 5 const 2871 Using where; Using temporary; Using filesort
1 SIMPLE t001 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 5
...
1 SIMPLE t056 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 8
What direction can I take to tune this query? All the indexes seem to be in place. Perhaps the t table (tickets) row number (2871) should be reduced. How many left joins is too much? Should the datafield tables be joined only once and then queried each for the data that is required?
You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.
It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.
The solution is: don't reconstruct the row in SQL.
Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.
SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)
Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.
for the clearer query i would use something like this:
SELECT ... FROM tickets as t
JOIN ticket_text_fields as txt ON t.id = txt.ticket_id
JOIN ticket_date_fields as dt ON t.id = dt.ticket_id
WHERE txt.textfield_id IN (...)
AND dt.datefield_id IN (...)
Joins would be probably LEFT, but it depends on the structure of your data.
There is no union in the query and there are only two joins
精彩评论