How would I write this SQL query?
I have the following tables:
PERSON_T DISEASE_T DRUG_T
========= ========== ========
PERSON_ID DISEASE_ID DRUG_ID
GENDER PERSON_ID PERSON_ID
NAME DISEASE_START_DATE DRUG_START_DATE
DISEASE_END_DATE DRUG_END_DATE
I want to write a query that takes an input of a disease id and returns one row for each person in the database with a column for the gender, a column for whether or not they have ever had the disease, and a column for each drug which specifies if they took the drug before contracting the disease. I.E. true would mean drug_start_date < disease_start_date. False would mean drug_start_date>disease_start_date or the person never took that particular drug.
We currently pull all of the data from the database and use Java to create a 2D array with all of these values. We are investigating moving this logic into the database. Is it possible开发者_运维知识库 to create a query that will return the result set as I want it or would I have to create a stored procedure? We are using Postgres, but I assume an SQL answer for another database will easily translate to Postgres.
Based on the info provided:
SELECT p.name,
p.gender,
CASE WHEN d.disease_id IS NULL THEN 'N' ELSE 'Y' END AS had_disease,
dt.drug_id
FROM PERSON p
LEFT JOIN DISEASE d ON d.person_id = p.person_id
AND d.disease_id = ?
LEFT JOIN DRUG_T dt ON dt.person_id = p.person_id
AND dt.drug_start_date < d.disease_start_date
..but there's going to be a lot of rows that will look duplicate except for the drug_id
column.
You're essentially looking to create a cross-tab query with the drugs. While there are plenty of OLAP tools out there that can do this sort of thing (among all sorts of other slicing and dicing of the data), doing something like this in traditional SQL is not easy (and, in general, impossible to do without some sort of procedural syntax in all but the simplest scenarios).
You essentially have two options when doing this with SQL (well, more accurately, you have one option, and another more complicated but flexible option that derives from it):
- Use a series of
CASE
statements in your query to produce columns that are representative of each individual drug. This requires knowing the list of variable values (i.e. drugs) ahead of time - Use a procedural SQL language, such as T-SQL, to dynamically construct a query that uses case statements as described above, but along with obtaining that list of values from the data itself.
The two options essentially do the same thing, you're just trading simplicity and ease of maintenance for flexibility in the second option.
For example, using option 1:
select
p.NAME,
p.GENDER,
(case when d.DISEASE_ID is null then 0 else 1 end) as HAD_DISEASE,
(case when sum(case when dr.DRUG_ID = 1 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_1,
(case when sum(case when dr.DRUG_ID = 2 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_2,
(case when sum(case when dr.DRUG_ID = 3 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_3
from PERSON_T p
left join DISEASE_T d on d.PERSON_ID = p.PERSON_ID and d.DISEASE_ID = @DiseaseId
left join DRUG_T dr on dr.PERSON_ID = p.PERSON_ID and dr.DRUG_START_DATE < d.DISEASE_START_DATE
group by p.PERSON_ID, p.NAME, p.GENDER, d.DISEASE_ID
As you can tell, this gets a little laborious as you get outside of just a few potential values.
The other option is to construct this query dynamically. I don't know PostgreSQL and what, if any, procedural capabilities it has, but the overall procedure would be this:
- Gather list of potential
DRUG_ID
values along with names for the columns - Prepare three string values: the SQL prefix (everything before the first drug-related
CASE
statement, the SQL stuffix (everything after the last drug-relatedCASE
statement), and the dynamic portion - Construct the dynamic portion by combining drug
CASE
statements based upon the previously retrieved list - Combine them into a single (hopefully valid) SQL statement and execute
精彩评论