correct way to create a pivot table in postgresql using CASE WHEN
I am trying to create a pivot table type view in postgresql and am nearly there! Here is the basic query:
select
acc2tax_node.acc, tax_node.name, tax_node.rank
from
tax_node, acc2tax_node
where
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
And the data:
acc 开发者_高级运维 | name | rank
----------+-------------------------+--------------
AJ012531 | Paromalostomum fusculum | species
AJ012531 | Paromalostomum | genus
AJ012531 | Macrostomidae | family
AJ012531 | Macrostomida | order
AJ012531 | Macrostomorpha | no rank
AJ012531 | Turbellaria | class
AJ012531 | Platyhelminthes | phylum
AJ012531 | Acoelomata | no rank
AJ012531 | Bilateria | no rank
AJ012531 | Eumetazoa | no rank
AJ012531 | Metazoa | kingdom
AJ012531 | Fungi/Metazoa group | no rank
AJ012531 | Eukaryota | superkingdom
AJ012531 | cellular organisms | no rank
What I am trying to get is the following:
acc | species | phylum
AJ012531 | Paromalostomum fusculum | Platyhelminthes
I am trying to do this with CASE WHEN, so I've got as far as the following:
select
acc2tax_node.acc,
CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as species,
CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as phylum
from
tax_node, acc2tax_node
where
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
Which gives me the output:
acc | species | phylum
----------+-------------------------+-----------------
AJ012531 | Paromalostomum fusculum |
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | | Platyhelminthes
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | |
AJ012531 | |
Now I know that I have to group by acc at some point, so I try
select
acc2tax_node.acc,
CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as sp,
CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as ph
from
tax_node, acc2tax_node
where
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531'
group by acc2tax_node.acc;
But I get the dreaded
ERROR: column "tax_node.rank" must appear in the GROUP BY clause or be used in an aggregate function
All the previous examples I've been able to find use something like SUM() around the CASE statements, so I guess that is the aggregate function. I have tried using FIRST():
select
acc2tax_node.acc,
FIRST(CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END) as sp,
FIRST(CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END) as ph
from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' group by acc2tax_node.acc;
but get the error:
ERROR: function first(character varying) does not exist
Can anyone offer any hints?
Use MAX() or MIN(), not FIRST(). In this scenario, you will have all NULLs in the column per each group value except for, at most, one with a not null value. By definition, this is both the MIN and the MAX of that set of values (all nulls are excluded).
PostgreSQL does have a couple of functions for pivot queries, see this article at Postgresonline. You can find these functions in the contrib.
As Matthew Wood pointed out, use MIN() or MAX(), not FIRST():
SELECT
an.acc,
MAX(
CASE tn.rank
WHEN 'species' THEN tn.name
ELSE NULL
END
) AS species,
MAX(
CASE tn.rank
WHEN 'phylum' THEN tn.name
ELSE NULL
END
) AS phylum
FROM tax_node tn,
acc2tax_node an
WHERE tn.taxid = an.taxid
and an.acc = 'AJ012531'
GROUP by an.acc;
SELECT atn.acc, ts.name AS species, tp.name AS phylum
FROM acc2tax_node atn
LEFT JOIN
tax_node ts
ON ts.taxid = atn.taxid
AND ts.rank = 'species'
LEFT JOIN
tax_node tp
ON tp.taxid = atn.taxid
AND tp.rank = 'phylum'
WHERE atn.acc = 'AJ012531 '
Further info as requested (in a reply rather than a comment for nice formatting):
SELECT * FROM acc2tax_node WHERE acc = 'AJ012531';
acc | taxid
----------+--------
AJ012531 | 66400
AJ012531 | 66399
AJ012531 | 39216
AJ012531 | 39215
AJ012531 | 166235
AJ012531 | 166384
AJ012531 | 6157
AJ012531 | 33214
AJ012531 | 33213
AJ012531 | 6072
AJ012531 | 33208
AJ012531 | 33154
AJ012531 | 2759
AJ012531 | 131567
Execute:
SELECT report.* FROM crosstab(
select
acc2tax_node.acc, tax_node.name, tax_node.rank
from
tax_node, acc2tax_node
where
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
) AS report(species text, enus text, family text, ...)
精彩评论