开发者

How would I write this SQL query?

I have the following tables:

PERSON_T              DISEASE_T               DRUG_T
=========             ==========              ========
PERSON_ID             DISEASE_ID              DRUG_ID
GENDER                PERSON_ID               PERSON_ID
NAME                  DISEASE_START_DATE      DRUG_START_DATE
                      DISEASE_END_DATE        DRUG_END_DATE

I want to write a query that takes an input of a disease id and returns one row for each person in the database with a column for the gender, a column for whether or not they have ever had the disease, and a column for each drug which specifies if they took the drug before contracting the disease. I.E. true would mean drug_start_date < disease_start_date. False would mean drug_start_date>disease_start_date or the person never took that particular drug.

We currently pull all of the data from the database and use Java to create a 2D array with all of these values. We are investigating moving this logic into the database. Is it possible开发者_运维知识库 to create a query that will return the result set as I want it or would I have to create a stored procedure? We are using Postgres, but I assume an SQL answer for another database will easily translate to Postgres.


Based on the info provided:

   SELECT p.name,
          p.gender,
          CASE WHEN d.disease_id IS NULL THEN 'N' ELSE 'Y' END AS had_disease,
          dt.drug_id
     FROM PERSON p
LEFT JOIN DISEASE d ON d.person_id = p.person_id
                   AND d.disease_id = ?
LEFT JOIN DRUG_T dt ON dt.person_id = p.person_id
                   AND dt.drug_start_date < d.disease_start_date

..but there's going to be a lot of rows that will look duplicate except for the drug_id column.


You're essentially looking to create a cross-tab query with the drugs. While there are plenty of OLAP tools out there that can do this sort of thing (among all sorts of other slicing and dicing of the data), doing something like this in traditional SQL is not easy (and, in general, impossible to do without some sort of procedural syntax in all but the simplest scenarios).

You essentially have two options when doing this with SQL (well, more accurately, you have one option, and another more complicated but flexible option that derives from it):

  1. Use a series of CASE statements in your query to produce columns that are representative of each individual drug. This requires knowing the list of variable values (i.e. drugs) ahead of time
  2. Use a procedural SQL language, such as T-SQL, to dynamically construct a query that uses case statements as described above, but along with obtaining that list of values from the data itself.

The two options essentially do the same thing, you're just trading simplicity and ease of maintenance for flexibility in the second option.

For example, using option 1:

select
    p.NAME,
    p.GENDER,
    (case when d.DISEASE_ID is null then 0 else 1 end) as HAD_DISEASE,
    (case when sum(case when dr.DRUG_ID = 1 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_1,
    (case when sum(case when dr.DRUG_ID = 2 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_2,
    (case when sum(case when dr.DRUG_ID = 3 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_3

from PERSON_T p

left join DISEASE_T d on d.PERSON_ID = p.PERSON_ID and d.DISEASE_ID = @DiseaseId
left join DRUG_T dr on dr.PERSON_ID = p.PERSON_ID and dr.DRUG_START_DATE < d.DISEASE_START_DATE

group by p.PERSON_ID, p.NAME, p.GENDER, d.DISEASE_ID

As you can tell, this gets a little laborious as you get outside of just a few potential values.

The other option is to construct this query dynamically. I don't know PostgreSQL and what, if any, procedural capabilities it has, but the overall procedure would be this:

  1. Gather list of potential DRUG_ID values along with names for the columns
  2. Prepare three string values: the SQL prefix (everything before the first drug-related CASE statement, the SQL stuffix (everything after the last drug-related CASE statement), and the dynamic portion
  3. Construct the dynamic portion by combining drug CASE statements based upon the previously retrieved list
  4. Combine them into a single (hopefully valid) SQL statement and execute
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜