SQL Server Query Returning Multiple Rows

2022-12-20 01:41 问答作者：

I am currently working on a SSIS package that is extracting a table from one database to another. The tables in both of the databases use the same column as the primary开发者_开发问答 key. My select statement to extract the data is a simple select statement. When I ran the package I was receiving a error that there where duplicate primary key values.

I reviewed my select statement and verified that my select statement was not returning duplicate rows. So to test this I removed the primary key from the table that I am inserting the data into and reran the SSIS package. After it ran I looked at the table to see what rows where being duplicated. What I found was that rows that where being edited while the extract was running where being duplicated, there was a record of it before the edit, and a record of it after the edit. I could easily tell this because the table has a last modified field that gets updated every time a record is updated.

I added a NOLOCK hint to my select statement, and it stopped returning duplicate rows.

So my question is why? I would have expected that a select statement with a NOLOCK table hint would have a higher chance of returning duplicate rows because it is not using locking, and that a select statement without the NOLOCK hint should use locking to make sure that it does not return duplicate rows.

Here is the select statement that I am using to select the data. I did verify that the joins are not causing it to duplicate rows:

SELECT pe.enc_id,
       pe.enc_nbr,
       pe.billable_ind,
       pe.clinical_ind AS clinical_ind,
       pe.budget_ind,
       pe.print_stmt_ind,
       pe.send_coll_letter_ind,
       pe.outsource_exempt_ind,
       cb.First_name + ' ' + cb.last_name AS CreatedBy,
       pe.create_timestamp AS create_timestamp,
       mb.first_name + ' ' + mb.last_name AS ModifiedBy,
       pe.modify_timestamp AS modify_timestamp
FROM   patient_encounter pe WITH(NOLOCK) 
       LEFT OUTER JOIN user_mstr cb WITH(NOLOCK) ON
           pe.created_by = cb.user_id
       LEFT OUTER JOIN user_mstr mb WITH(NOLOCK) ON
           pe.modified_by = mb.user_id

NOLOCK hint causes dirty read anomallies, and one such anomaly is a duplicate read. Such reads are frequent if an update changes the position of the row in the index scanned by the query:

say you have 2 rows in the table, with an ID key, rows with key values 1 and 2
one request (T1) runs UPDATE table SET key=3 WHERE key=1;
second request (T2) runs SELECT ... FROM table WITH(NOLOCK);
T1 locks the row with key value 1
T2 ignores the lock T1 has and reads the row with key value 1
T2 continue and reads row with key value 2
T1 update the row, and the row is moved in the index int he new position for key value 3
T2 continues to scan and reads the row with key value 3

So the SELECT has read a row twice, once while it had the key value 1 and once while it had a key value 3. This is just a trivial example of what can happen. In reality more complicated queries can run complex plans and use other indexes, all subject to such anomalies.

In short: NOLOCK hint is evil. If you want to avoid contention, use snapshot isolation.

The WITH NOLOCK hint just tells the database server to ignore the locks, and just select the current values from the database - hence it simply selects all the current row value at the time it hits that row.

Note that you will NOT get the updates in the new table of the rows that were being updated.

Without seeing your SQL, I would guess that the way it was constructed, it grabbed the current row, waited for a lock to clear, then selected the new row as well.

Locking the entire table would prevent the changes/duplicates, but you risk locking everyone out of the table while you do your select.

EDIT: FYI ALTERNATIVES: use the READPAST—Rows locked by other processes are skipped and TABLOCK—Lock at the table level which of course will block other processes and might not be desired.

NOTE: UPDLOCK gets converted to XLOCK during the transaction write.

There are two categories for the hints: granularity and isolation-level. Granularity includes PAGLOCK, NOLOCK, ROWLOCK, and TABLOCK. Isolation-level hints include HOLDLOCK, NOLOCK, READCOMMITTED, REPEATABLEREAD, and SERIALIZABLE. A maximum of one from each group may be used.

EDIT2: Just for completeness: READCOMMITTED—Read only data from transactions that have been committed. This is the default behavior.

EDIT3: More info: The NOLOCK WILL read the rows, but you risk reading "dirty" data that will change or having data that will not exist if a ROLLBACK happens to a transaction which may impact accuracy of the selected set.

The other important information is to discover what kinds of locks the transactions are using so you can plan accordingly.

继续阅读：sql-server ssis

SQL Server Query Returning Multiple Rows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？