How does mysql define DISTINCT() in reference documentation
EDIT: This question is about finding definitive reference to MySQL syntax on SELECT modifying keywords and functions. /EDIT
AFAIK SQL defines two uses of DISTINCT keywords - SELECT DISTINCT field... and SELECT COUNT(DISTINCT f开发者_C百科ield) ... However in one of web applications that I administer I've noticed performance issues on queries like
SELECT DISTINCT(field1), field2, field3 ...
DISTINCT() on a single column makes no sense and I am almost sure it is interpreted as
SELECT DISTINCT field1, field2, field3 ...
but how can I prove this?
I've searched mysql site for a reference on this particular syntax, but could not find any. Does anyone have a link to definition of DISTINCT() in mysql or knows about other authoritative source on this?
Best
EDIT After asking the same question on mysql forums I learned that while parsing the SQL mysql does not care about whitespace between functions and column names (but I am still missing a reference).
As it seems you can have whitespace between functions and the parenthesis
SELECT LEFT (field1,1), field2...
and get mysql to understand it as SELECT LEFT(field,1)
Similarly SELECT DISTINCT(field1), field2... seems to get decomposed to SELECT DISTINCT (field1), field2... and then DISTINCT is taken not as some undefined (or undocumented) function, but as SELECT modifying keyword and the parenthesis around field1 are evaluated as if they were part of field expression.
It would be great if someone would have a pointer to documentation where it is stated that the whitespace between functions and parenthesis is not significant or to provide links to apropriate MySQL forums, mailing lists where I could raise a question to put this into reference.
EDIT I have found a reference to server option IGNORE SPACE. It states that "The IGNORE SPACE SQL mode can be used to modify how the parser treats function names that are whitespace-sensitive", later on it states that recent versions of mysql have reduced this number from 200 to 30.
One of the remaining 30 is COUNT for example. With IGNORE SPACE enabled both
SELECT COUNT(*) FROM mytable;
SELECT COUNT (*) FROM mytable;
are legal.
So if this is an exception, I am left to conclude that normally functions ignore space by default.
If functions ignore space by default then if the context is ambiguous, such as for the first function on a first item of the select expression, then they are not distinguishable from keywords and the error can not be thrown and MySQL must accept them as keywords.
Still, my conclusions feel like they have lot of assumptions, I would still be grateful and accept any pointers to see where to follow up on this.
For completeness sake I am answering my own and linking to another question of my own. It seems that this behaviour is a direct consequence of SQL standard allowing whitespace between the function and parenthesis.
Since it is (generally) allowed to say FUNCTION_NAME (x) then when this function is applied to a first term of select
SELECT FUNCTION_NAME (x)
then there parser is going to have a hard time establishing if this is a context of a function name or SELECT modifying keyword.
So in the above case the FUNCTION_NAME is actually FUNCTION_NAME_OR_KEYWORD to the parser.
But it goes further: since the space between function name and parenthesis IS allowed the the parser actually can NOT distinguish between
SELECT FUNCTION_NAME_OR_KEYWORD (x)
and
SELECT FUNCTION_NAME_OR_KEYWORD(x)
(it must test the keywords to see if they are functions), and since (x) will be parsed to x it follows that for FUNCTION_NAME_OR_KEYWORD -> DISTINCT (and all other SELECT modifying keywords) there is no difference between
SELECT DISTINCT x, y, z, ...
and
SELECT DISTINCT(x), y, z, ...
QED, but without hard references (assumption that standard does not care about whitespace between function names and parenthesis is, I believe, justified, but I was unable to follow BNF grammar to the point that I could quote the exact rule).
NOTE: mysql has certain number of functions where it cares about whitespace between functions and parenthesis, but I believe that these are exceptions (hence server option to ignore it)
Interesting scenario.
As you found,
SELECT DISTINCT(a), b, c
is equivalent to:
SELECT DISTINCT (a), b, c
is equivalent to:
SELECT DISTINCT a, b, c
i.e. the parentheses are treated as expression grouping parentheses.
Interestingly, a very similar issue occurs in VB6/VBScript where (for Function(byref x)) Function(x), Function x and Call Function(x) are all slightly different in which value they pass by reference (Function(x) passes a reference to the result of the (x) expression and not x).
精彩评论