Explanation needed for missing rows with left join and count()
Can someone please help me understand the following behavior that occurs when I add a WHERE clause to a query that has a LEFT JOIN with COUNT(*)?
I have two tables:
TABLE 1: customers
customer_id | name
------------------
1 | Bob
2 | James
3 | Fred
TABLE 2: orders
order_id | customer_id | order_timestamp
----------------------------------------
1000 | 1 | 2011-01-01 00:00
1001 | 1 | 2011-01-05 00:00
1002 | 2 | 2011-01-10 00:00
Now the following query tells me how many orders each customer placed:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
group by 1
customer_id | count
-------------------
1 | 2
2 | 1
3 | 0
This works great BUT if I add a WHERE clause to the query, the query no longer outputs counts of zeroes for customers who did not place any orders even though I'm doing a LEFT JOIN:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
where o.order_timestamp >= '2011-01-05'
group by 1
customer_id | count
-------------------
1 | 1
2 | 1
Now if I move the WHERE condition as part of the LEFT JOIN like the following, I get back my zero counts for customers who did not place orders:
select c.customer_id, count(o.order_id)
from customers c
left join orders o on (c.customer_id = o.customer_id) and (o.order_timestamp >= '2011-01-05')
group by 1
I'm confused at why the second query does not work, but the third one does? Can someone please provide me with an explanation? Also not sur开发者_JAVA技巧e if this matters, but I'm using postgres. Thanks!
This is because NULL is not greater than or equal to anything; If you change your WHERE clause to where o.order_timestamp is null or o.order_timestamp >= '2011-01-05'
then you will get the same behavior as your join clause limit.
Note though - I would recommend the join clause approach, as it matches more closely what you are trying to do. Also the change to the WHERE clause I mentioned above will only work if the order_timestamp column is not nullable -- if it is then you should use a different column for the null check (eg, where o.primarykey is null or o.order_timestamp >= '2011-01-05'
).
Placement of filter criteria matters when dealing with OUTER joins (RIGHT, LEFT). Criteria in the ON clause of an OUTER JOIN is applied before the JOIN; criteria in the WHERE clause is applied after the JOIN -- applied against the resultset that uses the JOIN.
SELECT c.customer_id,
COUNT(o.order_id)
FROM CUSTOMERS c
LEFT JOIN ORDERS o ON o.customer_id - c.customer_id
AND o.order_timestamp >= '2011-01-05'
GROUP BY c.customer_id
Ordinals
Ordinals, meaning using a numeric value that refers to the numeric position of the columns in the SELECT clause, is not a recommended practice. If anyone changes the query -- say to add a column -- it could drastically affect your query.
Chirs is right, null is not greater than or equal to anything. So when you include your condition in where clause it applies on final view(table) of result generated by left join, in this result your condition removes row which has time stamp null.
However when you apply same condition during doing a join, condition applies only on order table and than left join performed. So it does not remove rows which has time stamp null.
So, in third query condition applied before final table generated and in second query condition applied after final table generated
精彩评论