Duplicate results returned from query when distinct is used
On a current project at I am needing to do some pagination of results returned from SQL. I have hit a corner case in which the query can accept identifiers as part of the where clause, normally this isn't an issu开发者_StackOverflow中文版e but in one case we have a single identifier being passed up that has a one to many relationship with one of the tables that the query joins on and it is returning multiple rows in the results. That issue was fixed by introducing a distinct to the query. The following is the query which returns the correct result of one row (all table/field names have been changed of course):
select distinct [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
However when I introduce pagination into the query I am finding that it is now returning multiple rows when it should be only be returning one. The method I am using for pagination is as follows:
select [item_id]
from (
select distinct [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
) as [items]
where [items].[row_num] between 0 and 100
I worry that adding a distinct to the outer query will cause an incorrect number of results to be returned and I am unsure of how else to fix this issue. The database I am querying is MS SQL Server 2008.
About 5 minutes after posting the question a possible solution hit me, if I group by the item_id (and any sort criteria) which should only be one instance of it should solve the issue. After testing this was the query that I was left with:
select [item_id]
from (
select [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
group by [item_table].[item_id], [item_table].[pub_date]
) as [items]
where [items].[row_num] between 0 and 100
I don't see where the DISTINCT is adding any value in your first query. The results are [item_table].[item_id] and [row_num]. Because the value of [row_num] is already distinct, the combination of [item_table].[item_id] and [row_num] will be distinct. When adding the DISTINCT keyword to the query, no rows are excluded.
In the second query, your results will return [item_id] from the sub query where [row_num] meets the criteria. If there where duplicate [item_id] values in the sub-query, there will be duplicates in the final results, but now you don't display [row_num] to distinguish the duplicates.
精彩评论