开发者

Identify data inconsistency in a Tuple-like table

I have a table that holds availability status for workers. Here is the structure:

CREATE TABLE [dbo].[Availability]
(
    [OID]                  BIGINT        IDENTITY (1, 1) NOT NULL,
    [LocumID]              BIGINT        NOT NULL,
    [AvailableDate]        SMALLDATETIME NOT NULL,
    [AvailabilityStatusID] INT           NOT NULL,
    [LastModifiedAt]       TIMESTAMP     NOT NULL,
    CONSTRAINT [PK_Availability] PRIMARY KEY CLUSTERED ([OID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];

And here is a result:

OID                  LocumID              AvailableDate           AvailabilityStatusID LastModifiedAt
-------------------- -------------------- ----------------------- -------------------- ------------------
1                    1                    2009-03-02 00:00:00     1                    0x0000000000201A8C
2                    2                    2009-03-04 00:00:00     1                    0x0000000000201A8D
3                    1                    2009-03-05 00:00:00     1                    0x0000000000201A8E
4                    1                    2009-03-06 00:00:00     1                    0x0000000000201A8F
5                    2                    2009-03-07 00:00:00     1               开发者_C百科     0x0000000000201A90
6                    7                    2009-03-09 00:00:00     1                    0x0000000000201A91
7                    1                    2009-03-11 00:00:00     1                    0x0000000000201A92
8                    1                    2009-03-12 00:00:00     2                    0x0000000000201A93
9                    1                    2009-03-14 00:00:00     1                    0x0000000000201A94
10                   1                    2009-03-16 00:00:00     1                    0x0000000000201A95

Now, the table has over 3mil record and I noticed that there are inconsistencies in my data. I need to somehow find rows where for any [AvailableDate], the [LocumID] (regardless of how many,) must be unique. So, basically, a worker can have one of these [AvailabilityStatusID] = 1, 2, 3, or 4 on one date. However, in this table, there are errors where a worker is entered twice or more against a [AvailableDate] with same [AvailabilityStatusID] or different [AvailabilityStatusID]

How can I detect these records?

Regards.


WITH x AS 
(
  SELECT LocumID, dt = AvailableDate
     FROM dbo.Availability
     GROUP BY LocumID, AvailableDate 
     HAVING COUNT(*) > 1
)
SELECT a.OID, a.LocumID, a.AvailableDate, 
    a.AvailabilityStatusID, a.LastModifiedAt
 FROM x
 INNER JOIN dbo.Availability AS a
 ON x.LocumID = a.LocumID
 AND x.dt = a.AvailableDate
 ORDER BY a.LocumID, a.AvailableDate;

Once you clean this data up (not sure what your rule will be regarding which rows to keep), you should consider a unique constraint on (LocumID, AvailableDate). Here is how you would create the constraint (though you will not be able to create it until you have removed the duplicates):

ALTER TABLE dbo.Availability 
    ADD CONSTRAINT uq_l_ad 
    UNIQUE (LocumID, AvailableDate);

Of course now you will have new errors returned to your application (Msg 2627), since your current code clearly doesn't already check if a LocumID/AvailabilityDate combination already exists before adding a new one.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜