COUNT and a sub COUNT in same query
I have 2 tables: members, orders.
Members: MemberID, DateCreated
Orders: OrderID, DateCreated, MemberID
I want to find out the number of new members in a given month broken down into number of order groups, eg. 开发者_如何学JAVA5+, 4, 3, 2, 1, 0
I've got the query to work out the number of orders for a member but how can I get these values in one query?
SELECT
COUNT(o.orderid) AS Purchases
FROM
members m
LEFT JOIN orders o ON o.memberid = m.memberid
AND MONTH(o.DateCreated) = 8
WHERE
MONTH(m.DateCreated) = 8
GROUP BY
m.memberid
ORDER BY
COUNT(o.orderid) DESC
You'll need to use sub-queries in the FROM clause, or a series of WITH statements before the main SELECT statement (if your DBMS supports that notation). You'll also need to fix your queries so that you don't report on the people who joined in August 2009 as well as those who joined in August 2010.
Simpler Answer
The 'harder answer' below is the much amended original query, and I've left it because it shows how I developed the answer. The following answer is simpler; it leverages the fact that COUNT(Column) returns 0 if there are no non-null values in the column to be counted.
It uses a table BaseCounts to control which aggregates should appear:
CREATE TEMP TABLE BaseCounts
(
NumOrders CHAR(2) NOT NULL PRIMARY KEY
);
INSERT INTO BaseCounts VALUES("0 ");
INSERT INTO BaseCounts VALUES("1 ");
INSERT INTO BaseCounts VALUES("2 ");
INSERT INTO BaseCounts VALUES("3 ");
INSERT INTO BaseCounts VALUES("4 ");
INSERT INTO BaseCounts VALUES("5+");
SELECT B.NumOrders, COUNT(N.MemberID) AS NumNewMembers
FROM BaseCounts AS B LEFT OUTER JOIN
(SELECT MemberID, CASE WHEN NumOrders < 5
THEN CAST(NumOrders AS CHAR(2))
ELSE "5+" END AS NumOrders
FROM (SELECT M.MemberID, COUNT(O.OrderID) AS NumOrders
FROM Members AS M LEFT OUTER JOIN Orders AS O
ON M.MemberID = O.MemberID AND
YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
GROUP BY M.MemberID
) AS NMO
) AS N
ON B.NumOrders = N.NumOrders
GROUP BY B.NumOrders
ORDER BY B.NumOrders;
The CREATE TEMP TABLE notation is for IBM Informix Dynamic Server (version 11.50 used for testing). The table vanishes at the end of the session (or when explicitly dropped), and it is private to a session. It could be a permanent base table (drop the keyword TEMP) instead.
The sub-query tagged NMO (for new member orders) is very important. The filter condition on O.DateCreated
must appear in the ON clause and not in the WHERE clause; otherwise, you do not get the zero counts that are needed. The COUNT(Column) notation is used twice.
The explanation in the harder answer shown below will help you understand details not explained in this simpler answer. Although 'simpler', I would not regard it as 'simple'. The whole answer shows the importance of iterating your design; I could not readily have produced the simpler answer without having gone through the effort of producing the harder one.
Harder Answer
This was the original development of the answer. I believe it might still have its uses to show how I approached the problem. With this as a basis, it was a relatively simple matter to remove the extra material when developing the simpler answer above.
Counting the zeroes is also surprisingly hard, as is doing '5+'. So, let's tackle it in stages.
New members with 1 or more purchases
SELECT M.MemberID, COUNT(*) AS NumOrders
FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
GROUP BY M.MemberID
Call that list 'NZO' (for 'non-zero orders'). Note that a LEFT OUTER JOIN would assign people to group '1' even if they placed no orders - not the desired result.
New members with 0 purchases
SELECT M.MemberID, 0 AS NumOrders
FROM Members AS M
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND NOT EXISTS (SELECT * FROM Orders AS O
WHERE YEAR(O.DateCreated) = 2010
AND MONTH(O.DateCreated) = 8
AND O.MemberID = M.MemberID
)
That is a nasty query because of the correlated sub-query, but it avoids referencing NZO. An alternative would be to find the list of members who joined in the reference month, and subtract from that the list of members with 1 or more orders (NZO).
Call that list 'WZO' (for 'with zero orders').
Clearly, NZO and WZO have no members in common - the UNION or UNION ALL of these gives a list of new members and the number of orders they placed.
New members in the six categories
SELECT MemberID, CAST(NumOrders AS CHAR(2)) AS NumOrders
FROM WZO
UNION
SELECT MemberID, CASE WHEN NumOrders < 5 THEN CAST(NumOrders AS CHAR(2))
ELSE "5+" END AS NumOrders
FROM NZO
The casts fix a problem with the types here - the NumOrders column is a numeric type and the result needs to be a string.
Call this list NMC (new members in categories).
Summarize the results
SELECT NumOrders, COUNT(*) AS NumNewMembers
FROM NMC
GROUP BY NumOrders
ORDER BY NumOrders;
Penultimate query
Assembling the various bits and pieces above - and getting the right bits in the right places - yields the following query:
SELECT NumOrders, COUNT(*) AS NumNewMembers
FROM (SELECT MemberID, CAST(NumOrders AS CHAR(2)) AS NumOrders
FROM (SELECT M.MemberID, 0 AS NumOrders
FROM Members AS M
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND NOT EXISTS (SELECT * FROM Orders AS O
WHERE YEAR(O.DateCreated) = 2010
AND MONTH(O.DateCreated) = 8
AND O.MemberID = M.MemberID
)
) AS WZO
UNION
SELECT MemberID, CASE WHEN NumOrders < 5
THEN CAST(NumOrders AS CHAR(2))
ELSE "5+" END AS NumOrders
FROM (SELECT M.MemberID, COUNT(*) AS NumOrders
FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
GROUP BY M.MemberID
) AS NZO
) AS NMC
GROUP BY NumOrders
ORDER BY NumOrders;
That complete query has been run successfully against IBM Informix Dynamic Server 11.50. For the sample data I generated (see below), I got the result:
numorders numnewmembers
CHAR(2) DECIMAL(15,0)
0 1
1 1
2 1
3 1
4 1
5+ 2
The general scheme of how to build up the query piecemeal should help you with designing your own queries in future. In particular, you can validate the different segments of the query as you go.
You might find it easier to work the date by creating the first and last days in the month you are interested in, and then running the query for those ranges - which is also more flexible because it could do quarters or half-months or periods straddling two months.
Also note that if there are no new members who place, say, 2 orders in the month they join, then there will be no row in the result. It is possible to fix that problem - it is not easy to fix that problem.
Dealing with 'No new members made N purchases'
There are probably several ways to get a row with a zero count for missing items. The technique I tend to use is to create a table which contains the rows that I want to appear, something like this - where I've created temporary tables to hold the result of each of the named expressions in the main part of the answer. This is a variant of the BaseCounts table shown in the simpler answer; that version did not need the NumNewMembers column whereas this version does.
CREATE TEMP TABLE BaseCounts
(
NumOrders CHAR(2) NOT NULL,
NumNewMembers DECIMAL(15,0) NOT NULL
);
INSERT INTO BaseCounts VALUES("0 ", 0);
INSERT INTO BaseCounts VALUES("1 ", 0);
INSERT INTO BaseCounts VALUES("2 ", 0);
INSERT INTO BaseCounts VALUES("3 ", 0);
INSERT INTO BaseCounts VALUES("4 ", 0);
INSERT INTO BaseCounts VALUES("5+", 0);
SELECT NumOrders, MAX(NumNewMembers) AS NumNewMembers
FROM (SELECT * FROM BaseCounts
UNION
SELECT NumOrders, COUNT(*) AS NumNewMembers
FROM NMC
GROUP BY NumOrders
)
GROUP BY NumOrders
ORDER BY NumOrders;
The second query in the UNION in the FROM clause is the previous 'final' answer, using a temporary table for the intermediate results.
Ultimate query
When written out to avoid the temp table, the query becomes:
SELECT NumOrders, MAX(NumNewMembers)
FROM (SELECT * FROM BaseCounts
UNION
SELECT NumOrders, COUNT(*) AS NumNewMembers
FROM (SELECT MemberID, CAST(NumOrders AS CHAR(2)) AS NumOrders
FROM (SELECT M.MemberID, 0 AS NumOrders
FROM Members AS M
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND NOT EXISTS (SELECT * FROM Orders AS O
WHERE YEAR(O.DateCreated) = 2010
AND MONTH(O.DateCreated) = 8
AND O.MemberID = M.MemberID
)
) AS WZO
UNION
SELECT MemberID, CASE WHEN NumOrders < 5
THEN CAST(NumOrders AS CHAR(2))
ELSE "5+" END AS NumOrders
FROM (SELECT M.MemberID, COUNT(*) AS NumOrders
FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
GROUP BY M.MemberID
) AS NZO
) AS NMC
GROUP BY NumOrders
)
GROUP BY NumOrders
ORDER BY NumOrders;
When using the modified data set, I get the result:
NumOrders NumNewMembers
CHAR(2) DECIMAL(15,0)
0 1
1 1
2 1
3 0
4 2
5+ 2
Some DBMS provide other, possibly more convenient, ways to create table values like the BaseCounts table.
One alternative technique that could be considered is some sort of outer join using 'COUNT(column)' instead of 'COUNT(*)'. When you use 'COUNT(column)', the query only counts the rows with a non-null value for 'column', so an outer join that generates a null in 'column' will yield 'COUNT(column)' of zero for the null. However, you still need a reference list from somewhere of the rows that should appear in the output so that you can determine when something is missing from the data set. This is provided by the BaseCounts table in my exposition.
WITH clause
Also, as noted at the top, the SQL standard and some DBMS provide a WITH clause that allows you to create named intermediate results that can then be used in the final query (or, indeed, later on in the WITH clause):
WITH <name1> AS (<query1>),
<name2>(<named-columns>) AS (<query2>),
...
SELECT ... FROM <name1> JOIN <name2> ON ...
Using this, we could write the following (untested) SQL:
WITH NZO AS (
SELECT M.MemberID, COUNT(*) AS NumOrders
FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
GROUP BY M.MemberID),
WZO AS (
SELECT M.MemberID, 0 AS NumOrders
FROM Members AS M
WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
AND NOT EXISTS (SELECT * FROM Orders AS O
WHERE YEAR(O.DateCreated) = 2010
AND MONTH(O.DateCreated) = 8
AND O.MemberID = M.MemberID
)),
NMC AS (
SELECT MemberID, CAST(NumOrders AS CHAR(2))
FROM WZO
UNION
SELECT MemberID, CASE WHEN NumOrders < 5
THEN CAST(NumOrders AS CHAR(2))
ELSE "5+" END AS NumOrders
FROM NZO),
NZC AS (
SELECT NumOrders, COUNT(*) AS NumNewMembers
FROM NMC
GROUP BY NumOrders)
SELECT NumOrders, MAX(NumNewMembers)
FROM (SELECT * FROM NZC
UNION
SELECT * FROM BaseCounts
)
GROUP BY NumOrders
ORDER BY NumOrders;
Sample Data
Tables
CREATE TABLE Members
(
MemberID INTEGER NOT NULL PRIMARY KEY,
DateCreated DATE NOT NULL
);
CREATE TABLE Orders
(
OrderID INTEGER NOT NULL PRIMARY KEY,
DateCreated DATE NOT NULL,
MemberID INTEGER NOT NULL REFERENCES Members
);
Members
INSERT INTO Members VALUES(1, '2009-08-03');
INSERT INTO Members VALUES(2, '2010-08-03');
INSERT INTO Members VALUES(3, '2010-08-05');
INSERT INTO Members VALUES(4, '2010-08-13');
INSERT INTO Members VALUES(5, '2010-08-15');
INSERT INTO Members VALUES(6, '2010-08-23');
INSERT INTO Members VALUES(7, '2010-08-23');
INSERT INTO Members VALUES(8, '2010-08-23');
INSERT INTO Members VALUES(9, '2010-09-03');
Orders
INSERT INTO Orders VALUES(11, '2010-08-03', 1);
INSERT INTO Orders VALUES(33, '2010-08-03', 3);
INSERT INTO Orders VALUES(44, '2010-08-05', 4);
INSERT INTO Orders VALUES(45, '2010-08-06', 4);
INSERT INTO Orders VALUES(56, '2010-08-11', 5);
INSERT INTO Orders VALUES(57, '2010-08-13', 5);
INSERT INTO Orders VALUES(58, '2010-08-23', 5);
--For testing 0 members with 3 orders (and 2 with 4 orders), add:
--INSERT INTO Orders VALUES(51, '2010-08-09', 5);
INSERT INTO Orders VALUES(61, '2010-08-05', 6);
INSERT INTO Orders VALUES(62, '2010-08-15', 6);
INSERT INTO Orders VALUES(63, '2010-08-15', 6);
INSERT INTO Orders VALUES(64, '2010-08-25', 6);
INSERT INTO Orders VALUES(71, '2010-08-03', 7);
INSERT INTO Orders VALUES(72, '2010-08-03', 7);
INSERT INTO Orders VALUES(73, '2010-08-03', 7);
INSERT INTO Orders VALUES(74, '2010-08-03', 7);
INSERT INTO Orders VALUES(75, '2010-08-03', 7);
INSERT INTO Orders VALUES(81, '2010-08-03', 8);
INSERT INTO Orders VALUES(82, '2010-08-03', 8);
INSERT INTO Orders VALUES(83, '2010-08-03', 8);
INSERT INTO Orders VALUES(84, '2010-08-03', 8);
INSERT INTO Orders VALUES(85, '2010-08-03', 8);
INSERT INTO Orders VALUES(86, '2010-08-03', 8);
INSERT INTO Orders VALUES(91, '2010-09-03', 9);
There's a couple ways you can do this, some of which could be fairly complicated.
This is the way I would do, focusing on the new member part rather than the count part:
SELECT COUNT(M.MemberID),
(SELECT COUNT(*) FROM Orders O WHERE O.MemberId = M.MemberId AND O.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)) AS num_orders
FROM Members M
WHERE M.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)
GROUP BY num_orders
I did the lookup with dates because it would be faster (it could make use of an index whereas MONTH(M.DateCreated)
would always do a full table scan, but you can change it back if really do need all orders/members from a given month).
EDIT: I forgot to handle the 5+ part of the question so here's an option for that:
SELECT COUNT(M.MemberID),
(SELECT IF(COUNT(*) >= 5, '5+', COUNT(*)) FROM Orders O WHERE O.MemberId = M.MemberId AND O.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)) AS num_orders
FROM Members M
WHERE M.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)
GROUP BY num_orders
精彩评论