开发者

Why can't you mix Aggregate values and Non-Aggregate values in a single SELECT?

I know that if you have one aggregate function in a SELECT statement, then all the other values in the statement must be either aggregate functions, or listed in a GROUP BY clause. I don't understand why that's the case.

If I do:

SELECT Name, 'Jones' AS Surname FROM People

I get:

NAME    SURNAME
Dave    Jones
Susan   Jones
Amy     Jones

So, the DBMS has taken a value from each row, and appended a single value to it in the result set. That's fine. But if that works, why can't I do:

SELECT Name, COUNT(Name) AS Surname FROM People

It seems like the same idea, take a value from each row and append a single value. But instead of:

NAME    SURNAME
Dave    3
Susan   3
Amy     3    

I get:

You tried to execute a query that does not include the specified expression 'ContactName' as part of an aggregate function.

I know it's not allowed, but the two circumstances seem so similar that I don't understand why. Is it to make the DBMS easier to implement? If anyone can explain to me why it doesn't work like I think it 开发者_运维知识库should, I'd be very grateful.


Aggregates doesn't work on a complete result, they only work on a group in a result.

Consider a table containing:

Person   Pet
-------- --------
Amy      Cat
Amy      Dog
Amy      Canary
Dave     Dog
Susan    Snake
Susan    Spider

If you use a query that groups on Person, it will divide the data into these groups:

Amy:
  Amy    Cat
  Amy    Dog
  Amy    Canary
Dave:
  Dave   Dog
Susan:
  Susan  Snake
  Susan  Spider

If you use an aggreage, for exmple the count aggregate, it will produce one result for each group:

Amy:
  Amy    Cat
  Amy    Dog
  Amy    Canary    count(*) = 3
Dave:
  Dave   Dog       count(*) = 1
Susan:
  Susan  Snake
  Susan  Spider    count(*) = 2

So, the query select Person, count(*) from People group by Person gives you one record for each group:

Amy    3
Dave   1
Susan  2

If you try to get the Pet field in the result also, that doesn't work because there may be multiple values for that field in each group.

(Some databases, like MySQL, does allow that anyway, and just returns any random value from within the group, and it's your responsibility to know if the result is sensible or not.)

If you use an aggregate, but doesn't specify any grouping, the query will still be grouped, and the entire result is a single group. So the query select count(*) from Person will create a single group containing all records, and the aggregate can count the records in that group. The result contains one row from each group, and as there is only one group, there will be one row in the result.


Think about it this way: when you call COUNT without grouping, it "collapses" the table to a single group making it impossible to access the individual items within a group in a select clause.

You can still get your result using a subquery or a cross join:

    SELECT p1.Name, COUNT(p2.Name) AS Surname FROM People p1 CROSS JOIN People p2 GROUP BY p1.Name

    SELECT Name, (SELECT COUNT(Name) FROM People) AS Surname FROM People


As others explained, when you have a GROUP BY or you are using an aggregate function like COUNT() in the SELECT list, you are doing a grouping of rows and therefore collapsing matching rows into one for every group.

When you only use aggregate functions in the SELECT list, without GROUP BY, think of it as you have a GROUP BY 1, so all rows are grouped, collapsed into one. So, if you have a hundred rows, the database can't really show you a name as there are a hundred of them.

However, for RDBMSs that have "windowing" functions, what you want is feasible. E.g. use aggregate functions without a GROUP BY.

Example for SQL-Server, where all rows (names) in the table are counted:

SELECT Name
     , COUNT(*) OVER() AS cnt
FROM People

How does the above work?

  • It shows the Name like the COUNT(*) OVER() AS cnt did not exist and

  • It shows the COUNT(*) like if it was making a total grouping of the table.


Another example. If you have a Surname field on the table, you can have something like this to show all rows grouped by Surname and counting how many people have same Surname:

SELECT Name
     , Surname
     , COUNT(*) OVER(PARTITION BY Surname) AS cnt
FROM People


Your query implicitly asks for different types of rows in your result set, and that is not allowed. All rows returned should be of the same type and have the same kind of columns.

'SELECT name, surname' wants to returns a row for every row in the table.

'SELECT COUNT(*)' wants to return a single row combining the results of all the rows in the table.

I think you're correct that in this case the database could plausibly just do both queries and then copy the result of 'SELECT COUNT(*)' into every result. One reason for not doing this is that it would be a stealth performance hit: you'd effectively be doing an extra self-join without declaring it anywhere.

Other answers have explained how to write a working version of this query, so I won't go into that.


The aggregate function and the group by clause aren't separate things, they're parts of the same thing that appear in different places in the query. If you wish to aggregate on a column, you must say what function to use for aggregation; if you wish to have an aggregation function, it has to be applied over some column.


The aggregate function takes values from multiple rows with a specific condition and combines them into one value. This condition is defined by the GROUP BYin your statement. So you can't use an aggregate function without a GROUP BY

With

SELECT Name, 'Jones' AS Surname FROM People  

you simply select an additional column with a fixed value... but with

SELECT Name, COUNT(Name) AS Surname FROM People GROUP BY Name

you tell the DBMS to select the Names, remember how often every Name occured in the table and collapse them into one row. So if you omit the GROUP BY the DBMS can't tell, how to collapse the records

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜