SQL Query Theory Question
I have a large historical transaction table (15-20 million rows MANY columns) and a table with one row one column. The table with one row contains a date (la开发者_运维技巧st processing date) which will be used to pull the data in the trasaction table ('process_date').
Question: Should I inner join the 'process_date' table to the transaction table or the transaction table to the 'process_date' table?
This is how I would do it
SELECT <<list only columns you need>>
FROM large_historical_transaction_table t
WHERE EXISTS (SELECT 1 FROM OneRowTable o
WHERE o.last_processing_date = t.process_date)
An Inner join is a symmetrical, bi-directional relationship, in general it doesn't matter, but in this case I would suggest not joining at all, read the threshold date into a variable and pass it to the other select query as a parameter...
For readability I would inner join from the transaction table to explicitly indicate that the second table with the date just acts like a filter.
When joining tables, the query optimizer takes a quick sniff of both to determine the most appropriate join implementation. Logically the inner join is symmetric, but the implementation may favor one side over the other for improved performance.
精彩评论