An aggregate function that only allows one unique input
I often find myself adding expressions in the group by
clause that I am sure are unique. It sometimes turns out I am wrong - because of an error in my SQL or a mistaken assumption, and that expression is not really unique.
There are many cases when I would much rather this would generate a SQL error rather than expanding my result set silently and sometimes very subtly.
I would love to be able to do something like:
select product_id, unique description from product group by product_id
but obviously I can't implement that myself - 开发者_如何学JAVAbut something nearly as concise can be implemented with user defined aggregates on some databases.
Would a special aggregate that only allows one unique input value be generally helpful in all versions of SQL? If so, could such a thing be implemented now on most databases? null
values should be considered just like any other value - unlike the way the built-in aggregate avg
typically works. (I have added answers with ways of implementing this for postgres and Oracle.)
The following example is intended to show how the aggregate would be used, but is a simple case where it is obvious which expressions should be unique. Real usage would more likely be in larger queries where it is easier to make mistaken assumptions about uniqueness
tables:
product_id | description
------------+-------------
1 | anvil
2 | brick
3 | clay
4 | door
sale_id | product_id | cost
---------+------------+---------
1 | 1 | £100.00
2 | 1 | £101.00
3 | 1 | £102.00
4 | 2 | £3.00
5 | 2 | £3.00
6 | 2 | £3.00
7 | 3 | £24.00
8 | 3 | £25.00
queries:
> select * from product join sale using (product_id);
product_id | description | sale_id | cost
------------+-------------+---------+---------
1 | anvil | 1 | £100.00
1 | anvil | 2 | £101.00
1 | anvil | 3 | £102.00
2 | brick | 4 | £3.00
2 | brick | 5 | £3.00
2 | brick | 6 | £3.00
3 | clay | 7 | £24.00
3 | clay | 8 | £25.00
> select product_id, description, sum(cost)
from product join sale using (product_id)
group by product_id, description;
product_id | description | sum
------------+-------------+---------
2 | brick | £9.00
1 | anvil | £303.00
3 | clay | £49.00
> select product_id, solo(description), sum(cost)
from product join sale using (product_id)
group by product_id;
product_id | solo | sum
------------+-------+---------
1 | anvil | £303.00
3 | clay | £49.00
2 | brick | £9.00
error case:
> select solo(description) from product;
ERROR: This aggregate only allows one unique input
An ORACLE solution is
select product_id,
case when min(description) != max(description) then to_char(1/0)
else min(description) end description,
sum(cost)
from product join sale using (product_id)
group by product_id;
Rather than the to_char(1/0) [which raises a DIVIDE_BY_ZERO error), you can use a simple function which does
CREATE OR REPLACE FUNCTION solo (i_min IN VARCHAR2, i_max IN VARCHAR2)
RETURN VARCHAR2 IS
BEGIN
IF i_min != i_max THEN
RAISE_APPLICATION_ERROR(-20001, 'Non-unique value specified');
ELSE
RETURN i_min;
END;
END;
/
select product_id,
solo(min(description),max(description)) end description,
sum(cost)
from product join sale using (product_id)
group by product_id;
You can use a user defined aggregate, but I'd be worried about the performance impact of switching between SQL and PL/SQL.
Here is my implementation for postgres (edited to treat null
as a unique value too):
create function solo_sfunc(inout anyarray, anyelement)
language plpgsql immutable as $$
begin
if $1 is null then
$1[1] := $2;
else
if ($1[1] is not null and $2 is null)
or ($1[1] is null and $2 is not null)
or ($1[1]!=$2) then
raise exception 'This aggregate only allows one unique input';
end if;
end if;
return;
end;$$;
create function solo_ffunc(anyarray) returns anyelement
language plpgsql immutable as $$
begin
return $1[1];
end;$$;
create aggregate solo(anyelement)
(sfunc=solo_sfunc, stype=anyarray, ffunc=solo_ffunc);
example tables for testing:
create table product(product_id integer primary key, description text);
insert into product(product_id, description)
values (1, 'anvil'), (2, 'brick'), (3, 'clay'), (4, 'door');
create table sale( sale_id serial primary key,
product_id integer not null references product,
cost money not null );
insert into sale(product_id, cost)
values (1, '100'::money), (1, '101'::money), (1, '102'::money),
(2, '3'::money), (2, '3'::money), (2, '3'::money),
(3, '24'::money), (3, '25'::money);
You should define a UNIQUE constraint on (product_id, description), then you never have to worry about there being two descriptions for one product.
And here is my implementation for Oracle - unfortunately I think you need one implementation for each base type:
create type SoloNumberImpl as object
(
val number,
flag char(1),
static function ODCIAggregateInitialize(sctx in out SoloNumberImpl)
return number,
member function ODCIAggregateIterate( self in out SoloNumberImpl,
value in number )
return number,
member function ODCIAggregateTerminate( self in SoloNumberImpl,
returnValue out number,
flags in number )
return number,
member function ODCIAggregateMerge( self in out SoloNumberImpl,
ctx2 in SoloNumberImpl )
return number
);
/
create or replace type body SoloNumberImpl is
static function ODCIAggregateInitialize(sctx in out SoloNumberImpl)
return number is
begin
sctx := SoloNumberImpl(null, 'N');
return ODCIConst.Success;
end;
member function ODCIAggregateIterate( self in out SoloNumberImpl,
value in number )
return number is
begin
if self.flag='N' then
self.val:=value;
self.flag:='Y';
else
if (self.val is null and value is not null)
or (self.val is not null and value is null)
or (self.val!=value) then
raise_application_error( -20001,
'This aggregate only allows one unique input' );
end if;
end if;
return ODCIConst.Success;
end;
member function ODCIAggregateTerminate( self in SoloNumberImpl,
returnValue out number,
flags in number )
return number is
begin
returnValue := self.val;
return ODCIConst.Success;
end;
member function ODCIAggregateMerge( self in out SoloNumberImpl,
ctx2 in SoloNumberImpl )
return number is
begin
if self.flag='N' then
self.val:=ctx2.val;
self.flag=ctx2.flag;
elsif ctx2.flag='Y' then
if (self.val is null and ctx2.val is not null)
or (self.val is not null and ctx2.val is null)
or (self.val!=ctx2.val) then
raise_application_error( -20001,
'This aggregate only allows one unique input' );
end if;
end if;
return ODCIConst.Success;
end;
end;
/
create function SoloNumber (input number)
return number aggregate using SoloNumberImpl;
/
精彩评论