Is there a better way to organize and make this SQL query more readable?
I have inherited a piece of SQL that I am working on for a reporting system. The system is centered around Purchase Orders, and when they are Created, Transmitted, Acknowledged, and Confirmed. These states are a progressive state. The piece of SQL that I am reviewing is shown below and is from the WHERE clause:
OR (CreateJob.endtime is NULL and DATEDIFF(hour, buy_date, getdate()) > Vendor.expected_create_hours)
OR (TransmitJob.endtime is NULL and DATEDIFF(hour, CreateJob.endtime, getdate()) > Vendor.expected_transmit_hours)
OR (AcknowledgeJob.endtime is NULL and DATEDIFF(hour, TransmitJob.endtime, getdate()) > Vendor.expected_acknowledge_hours)
OR (ConfirmJob.endtime is NULL and DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) > Vendor.expected_confirm_hours)
What I have found is that it is possible for a CreateJob to not have an end time because the job failed. However a job can execute more then one Purchase Order, so there are times when a given PO actually was created, but the job does not receive an end time because a sibling later failed. This creates a scenario where the states still progress for a PO, but it still shows up on this problem report because the CreateJob.endtime is NULL.
So first there are some obvious bugs with the system and architecture above, but that is a separate issue from what I am tackling at the moment.
It seems that I could fix the report though, by basically adding the progressive checks as AND statements. So the first check for the CreateJob changes too:
( (CreateJob.endtime is NULL and DATEDIFF(hour, buy_date, getdate()) > Vendor.expected_create_hours)
AND (TransmitJob.endtime is NULL and DATEDIFF(hour, CreateJo开发者_Python百科b.endtime, getdate()) > Vendor.expected_transmit_hours)
AND (AcknowledgeJob.endtime is NULL and DATEDIFF(hour, TransmitJob.endtime, getdate()) > Vendor.expected_acknowledge_hours)
AND (ConfirmJob.endtime is NULL and DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) > Vendor.expected_confirm_hours))
Which is ugly and begins to obfuscate things into oblivion. Is there some way I can do the equivalent of the following in SQL? Basically some kind of macro or #define system?
CreateFailed = (CreateJob.endtime is NULL and DATEDIFF(hour, buy_date, getdate()) > Vendor.expected_create_hours)
TransmitFailed = (TransmitJob.endtime is NULL and DATEDIFF(hour, CreateJob.endtime, getdate()) > Vendor.expected_transmit_hours)
AcknowledgeFailed = (AcknowledgeJob.endtime is NULL and DATEDIFF(hour, TransmitJob.endtime, getdate()) > Vendor.expected_acknowledge_hours)
ConfirmFailed = (ConfirmJob.endtime is NULL and DATEDIFF(hour, AcknowledgeJob.endtime, getdate()) > Vendor.expected_confirm_hours)
OR (CreateFailed AND TransmitFailed AND AcknowledgeFailed AND ConfirmFailed)
OR (TransmitFailed AND AcknowledgeFailed AND ConfirmFailed)
OR (AcknowledgeFailed AND ConfirmFailed)
OR (ConfirmFailed)
If there is a common term or name for what I am trying to do maybe someone could add it to the title?
A side point - Is the code snippet you've provided an extract from your production code? If so, you can replace the whole block with just the ConfirmFailed
condition, since it's common to all the WHERE
conditions you've shown and can satisfy it on its own.
Whilst it doesn't directly answer the question you asked, an approach to simplifying your code would be to calculate the values for each Failed
indicator once in a pre-processing step - depending on your environment this might be a CTE, a temporary table or a view. That way, all the logic is in one place - this is particularly useful if the value is needed more than once in the query - in the SELECT
and WHERE
clauses for example.
[is] there is a common term or name for what I am trying to do...?
Abstraction?
You could hide each of your CreateFailed = ..., TransmitFailed = ...
, etc logic into its own VIEW
or CTE, then your main query could merely see if a Job exists in one of these tables e.g. (pseudo SQL):
SELECT Jobs.job_ID, ...
FROM Jobs
WHERE EXISTS (
SELECT *
FROM CreatedFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
)
OR EXISTS (
SELECT *
FROM TransmitFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
)
OR EXISTS (
SELECT *
FROM AcknowledgeFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
)
OR EXISTS (
SELECT *
FROM ConfirmFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
);
Here's the CTE idea expanded (pseudo SQL):
WITH CreateFailedJobs (job_ID, ...)
AS
(
SELECT Jobs.job_ID, ...
FROM ...
WHERE CreateJob.endtime is NULL
AND DATEDIFF(hour, .SomeTable.buy_date, getdate())
> Vendor.expected_create_hours
),
TransmitFailedJobs (job_ID, ...)
AS
(
SELECT Jobs.job_ID, ...
FROM ...
WHERE TransmitJob.endtime is NULL
AND DATEDIFF(hour, CreateJob.endtime, getdate())
> Vendor.expected_transmit_hours
),
AcknowledgeFailedJobs (job_ID, ...)
AS
(
SELECT Jobs.job_ID, ...
FROM ...
WHERE AcknowledgeJob.endtime is NULL
AND DATEDIFF(hour, TransmitJob.endtime, getdate())
> Vendor.expected_acknowledge_hours
),
ConfirmFailedJobs (job_ID, ...)
AS
(
SELECT Jobs.job_ID, ...
FROM ...
WHERE ConfirmJob.endtime is NULL
AND DATEDIFF(hour, AcknowledgeJob.endtime, getdate())
> Vendor.expected_confirm_hours)
)
SELECT Jobs.job_ID, ...
FROM Jobs
WHERE EXISTS (
SELECT *
FROM CreatedFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
)
OR EXISTS (
SELECT *
FROM TransmitFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
)
OR EXISTS (
SELECT *
FROM AcknowledgeFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
)
OR EXISTS (
SELECT *
FROM ConfirmFailedJobs AS T1
WHERE Jobs.job_ID = T1.job_ID
);
I'm not sure that this would be good for performance but readability is the aim and anyhow this report can be run offline.
精彩评论