Identify data inconsistency in a Tuple-like table
I have a table that holds availability status for workers. Here is the structure:
CREATE TABLE [dbo].[Availability]
(
[OID] BIGINT IDENTITY (1, 1) NOT NULL,
[LocumID] BIGINT NOT NULL,
[AvailableDate] SMALLDATETIME NOT NULL,
[AvailabilityStatusID] INT NOT NULL,
[LastModifiedAt] TIMESTAMP NOT NULL,
CONSTRAINT [PK_Availability] PRIMARY KEY CLUSTERED ([OID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
And here is a result:
OID LocumID AvailableDate AvailabilityStatusID LastModifiedAt
-------------------- -------------------- ----------------------- -------------------- ------------------
1 1 2009-03-02 00:00:00 1 0x0000000000201A8C
2 2 2009-03-04 00:00:00 1 0x0000000000201A8D
3 1 2009-03-05 00:00:00 1 0x0000000000201A8E
4 1 2009-03-06 00:00:00 1 0x0000000000201A8F
5 2 2009-03-07 00:00:00 1 开发者_C百科 0x0000000000201A90
6 7 2009-03-09 00:00:00 1 0x0000000000201A91
7 1 2009-03-11 00:00:00 1 0x0000000000201A92
8 1 2009-03-12 00:00:00 2 0x0000000000201A93
9 1 2009-03-14 00:00:00 1 0x0000000000201A94
10 1 2009-03-16 00:00:00 1 0x0000000000201A95
Now, the table has over 3mil record and I noticed that there are inconsistencies in my data. I need to somehow find rows where for any [AvailableDate]
, the [LocumID]
(regardless of how many,) must be unique. So, basically, a worker can have one of these [AvailabilityStatusID] = 1, 2, 3, or 4
on one date. However, in this table, there are errors where a worker is entered twice or more against a [AvailableDate]
with same [AvailabilityStatusID]
or different [AvailabilityStatusID]
How can I detect these records?
Regards.
WITH x AS
(
SELECT LocumID, dt = AvailableDate
FROM dbo.Availability
GROUP BY LocumID, AvailableDate
HAVING COUNT(*) > 1
)
SELECT a.OID, a.LocumID, a.AvailableDate,
a.AvailabilityStatusID, a.LastModifiedAt
FROM x
INNER JOIN dbo.Availability AS a
ON x.LocumID = a.LocumID
AND x.dt = a.AvailableDate
ORDER BY a.LocumID, a.AvailableDate;
Once you clean this data up (not sure what your rule will be regarding which rows to keep), you should consider a unique constraint on (LocumID, AvailableDate). Here is how you would create the constraint (though you will not be able to create it until you have removed the duplicates):
ALTER TABLE dbo.Availability
ADD CONSTRAINT uq_l_ad
UNIQUE (LocumID, AvailableDate);
Of course now you will have new errors returned to your application (Msg 2627), since your current code clearly doesn't already check if a LocumID/AvailabilityDate combination already exists before adding a new one.
精彩评论