Semi-complex aggregate select statement confusion
Alright, this problem is a little complicated, so bear with me.
I have a table full of data. One of the table columns is an EntryDate. There can be multiple entries per day. However, I want to select all rows that are the latest entry on their respective days, and I want to select all the columns of said table.
One of the columns is a unique identifier column, but it is not the primary key (I have no idea why it's there; this is a pretty old system). For purposes of demonstration, say the table looks like this:
create table ExampleTable (
ID int identity(1,1) not null,
PersonID int not null,
StoreID int not null,
Data1 int not null,
Data2 int not null,
EntryDate datetime not null
)
The primary key is on PersonID and StoreID, which logically defines uniqueness.
Now, like I said, I want to select all the rows that are the latest entries on that particular day (for each Person-Store combination). This is pretty easy:
--Figure 1
select PersonID, StoreID, max(EntryDate)
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)
Where dbo.dayof() is a simple function that strips the time component from a datetime. However, doing this loses the rest of the columns! I can't simply include the other columns, because then I'd have to group by
them, which would produce the wrong results (especially since ID is unique).
I have found a dirty hack that will do what I want, but there must be a better way -- here's my current solution:
select
cast(null as int) as ID,
PersonID,
StoreID,
cast(null as int) as Data1,
cast(null as int) as Data2,
max(EntryDate) as EntryDate
into #StagingTable
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)
update Target set
ID = Source.ID,
Data1 = Source.Data1,
Data2 = Source.Data2,
from #StagingTable as Target
inner join ExampleTable as Source
on Source.PersonID = Target.PersonID
and Source.StoreID = Target.StoreID
and Source.EntryDate = Target.EntryDate
This gets me the correct data in #StagingTable
but, well, look at it! Creating a table with null values, then doing an update to get the values back -- surely there's a better way to do this? A single statement that will get me all the values the first time?
It is my belief that the correct join on that or开发者_如何学Ciginal select
(Figure 1) would do the trick, like a self-join or something... but how do you do that with the group by
clause? I cannot find the right syntax to make the query execute.
I am pretty new with SQL, so it's likely that I'm missing something obvious. Any suggestions?
(Working in T-SQL, if it makes any difference)
There is no really "elegant" way. When you have group Group By queries like this, you're going to have either sub-queries or temporary tables.
This will work:
Select ID, A.PersonID, A.StoreID, Data1, Data2, A.EntryDate
From ExampleTable As A
Inner Join
(select PersonID, StoreID, max(EntryDate) As EntryDate
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)) As B
On ExampleTable.PersonID = B.PersonID
And ExampleTable.StoreID = B.StoreID
And ExampleTable.EntryDate = B.EntryDate
You should not be too down on the solution you came up with though. Using temporary tables never looks elegant, but it is efficient; I would not be surprised if your original two-step solution is actually faster than my one-step solution. (you'll have to test to know for sure.)
精彩评论