Is it safe to include extra columns in the SELECT list of a SQLite GROUP BY query?

2022-12-11 13:27 问答作者：

I have a simple SQLite table called "message":

sequence INTEGER PRIMARY KEY
type TEXT
content TEXT

I want to get the content of the last message of each type (as determined by its sequence). To my surprise, the following simple query works:

SELECT MAX(sequence), type, content
FROM message
GROUP BY type

Surprise, because I know that MSSQL or Postgres would refuse to include a column in the SELECT list that is not part of the GROUP BY clause or an aggregate function and I'd have to do a join, like this:

SELECT m.sequence, m.type, m.content
FROM
(
    SELECT MAX(sequence) as sequence, type
    FROM message
    GROUP BY type
) g
JO开发者_开发百科IN message m
ON g.sequence = m.message_sequence

My question is: is it safe to use the first, much simpler, form of the query in SQLite? It intuitively makes sense that it selects the "content" value that matches the "MAX(sequence)" value, but the documentation doesn't seem to talk about this at all. Of course, if sequence was not unique then the result would be undefined. But if sequence is unique, as in my case, is this guaranteed or is it simply a lucky implementation detail that's subject to change?

You can use these queries "safely," that is, without getting ambiguous results, if the extra columns are functionally dependent on the column(s) you group by:

SELECT c.parent_id, COUNT(*), p.any_column
FROM child_table c 
JOIN parent_table p USING (parent_id)
GROUP BY c.parent_id;

The example above would work in SQLite, and produce an unambiguous result, because there's no way p.any_column could have multiple values per group. However, this query is strictly in violation of the SQL standard, and most brands of RDBMS would raise an error.

It's too easy to write a query that produces ambiguous results, though. When you name a column that has multiple values per group, you can't control which value is returned in your result set.

In practice, MySQL returns the value from the first row with respect to physical storage, and SQLite returns the value from the last row. But it's totally implementation-dependent and not reliable. If the next version of either software changes its internals, you could get different query results after you upgrade. So it's best not to rely on this behavior.

Regarding your example, where content should "intuitively" have the value from the row where sequence is MAX. But is this really intuitive? Consider these other cases:

SELECT MAX(sequence), MIN(sequence), type, content
FROM message
GROUP BY type

So which row now supplies the value for content? The row where sequence is MAX, or the row where sequence is MIN?

What if you use a non-unique column (e.g. date), and there are multiple rows with the same MAX value for date, but different values for content?

SELECT MAX(date), type, content
FROM message
GROUP BY type

What about other aggregate functions like AVG() or SUM()? It could be that the value of the aggregate corresponds to no individual row in the table. So now which row should supply the value for content?

SELECT AVG(sequence), type, content
FROM message
GROUP BY type

I don't know of any database which will "intuitively" solve this sort of problem, where you want to get related row values for a group based upon the result of an aggregate for a specific column. For SQLite, I think you had better stick with your second query.

Since you mentioned PostgreSQL, it's worth noting that it supports some non-standard syntax which accomplishes this, in the form of DISTINCT ON:

select distinct on (type) sequence, type, content
from message
order by sequence desc

(There could be some errors in that, as I don't have a psql prompt in front of me, but that's the gist of it.)

See http://www.postgresql.org/docs/8.4/interactive/queries-select-lists.html

I bet it just uses a random value for sequence field. MySQL docs for instance explicitly say so.

继续阅读：group-by specifications sql

Is it safe to include extra columns in the SELECT list of a SQLite GROUP BY query?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？