开发者

Alternate to Left Joins on large datasets

If I needed to do a left join between 2 tables (in order to run some kind of analysis between them), but both datasets are too large for this to be executed in a single query, what's the best practice to accomplish this?

I saw FETCH in the documentation but wasn't sure if this is conventionally used to loop over entire datasets. Since I figured this task had to be common开发者_JS百科place, I wasn't going to kill myself trying to improperly hodgepodge FETCH or OFFSET in order to accomplish my analysis.

Note: This is a local database, and will not be altered through the duration of the procedure - so performance considerations and transactions aren't a factor.

I'm using PostgreSQL, but I'm sure the practice is similar across all modern DBMS.


I agree with the comments that a modern DBMS should be able to join any table that they can store. Sometimes you have to tell the database not to try a hash join on gigantic tables; hash joins are very fast, but not for joins where the hash doesn't fit in memory. For PostreSQL, you can disable hash joins with:

SET ENABLE_HASHJOIN TO FALSE

Having said that, some databases do perform better if you split a query in smaller batches. You can use subqueries to partition a join in batches:

select  *
from    (
        select  *
        from    YourTable1
        where   CustomerName like 'A%'
        ) a
left join 
        (
        select  *
        from    YourTable2
        where   CustomerName like 'A%'
        ) b
on      a.CustumerName = b.CustomerName

This only helps the database if there is an efficient way to filter. In the example, that would be an index on CustomerName in both tables.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜