Alternate to Left Joins on large datasets
If I needed to do a left join between 2 tables (in order to run some kind of analysis between them), but both datasets are too large for this to be executed in a single query, what's the best practice to accomplish this?
I saw FETCH in the documentation but wasn't sure if this is conventionally used to loop over entire datasets. Since I figured this task had to be common开发者_JS百科place, I wasn't going to kill myself trying to improperly hodgepodge FETCH or OFFSET in order to accomplish my analysis.
Note: This is a local database, and will not be altered through the duration of the procedure - so performance considerations and transactions aren't a factor.
I'm using PostgreSQL, but I'm sure the practice is similar across all modern DBMS.
I agree with the comments that a modern DBMS should be able to join any table that they can store. Sometimes you have to tell the database not to try a hash join on gigantic tables; hash joins are very fast, but not for joins where the hash doesn't fit in memory. For PostreSQL, you can disable hash joins with:
SET ENABLE_HASHJOIN TO FALSE
Having said that, some databases do perform better if you split a query in smaller batches. You can use subqueries to partition a join in batches:
select *
from (
select *
from YourTable1
where CustomerName like 'A%'
) a
left join
(
select *
from YourTable2
where CustomerName like 'A%'
) b
on a.CustumerName = b.CustomerName
This only helps the database if there is an efficient way to filter. In the example, that would be an index on CustomerName
in both tables.
精彩评论