开发者

Semi-complex aggregate select statement confusion

Alright, this problem is a little complicated, so bear with me.

I have a table full of data. One of the table columns is an EntryDate. There can be multiple entries per day. However, I want to select all rows that are the latest entry on their respective days, and I want to select all the columns of said table.

One of the columns is a unique identifier column, but it is not the primary key (I have no idea why it's there; this is a pretty old system). For purposes of demonstration, say the table looks like this:

create table ExampleTable (
    ID int identity(1,1) not null,
    PersonID int not null,
    StoreID int not null,
    Data1 int not null,
    Data2 int not null,
    EntryDate datetime not null
)

The primary key is on PersonID and StoreID, which logically defines uniqueness.

Now, like I said, I want to select all the rows that are the latest entries on that particular day (for each Person-Store combination). This is pretty easy:

--Figure 1
select PersonID, StoreID, max(EntryDate)
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)

Where dbo.dayof() is a simple function that strips the time component from a datetime. However, doing this loses the rest of the columns! I can't simply include the other columns, because then I'd have to group by them, which would produce the wrong results (especially since ID is unique).

I have found a dirty hack that will do what I want, but there must be a better way -- here's my current solution:

select
    cast(null as int) as ID,
    PersonID,
    StoreID,
    cast(null as int) as Data1,
    cast(null as int) as Data2,
    max(EntryDate) as EntryDate
into #StagingTable
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)

update Target set
    ID = Source.ID,
    Data1 = Source.Data1,
    Data2 = Source.Data2,
from #StagingTable as Target
inner join ExampleTable as Source
    on Source.PersonID = Target.PersonID
   and Source.StoreID = Target.StoreID
   and Source.EntryDate = Target.EntryDate

This gets me the correct data in #StagingTable but, well, look at it! Creating a table with null values, then doing an update to get the values back -- surely there's a better way to do this? A single statement that will get me all the values the first time?

It is my belief that the correct join on that or开发者_如何学Ciginal select (Figure 1) would do the trick, like a self-join or something... but how do you do that with the group by clause? I cannot find the right syntax to make the query execute.

I am pretty new with SQL, so it's likely that I'm missing something obvious. Any suggestions?

(Working in T-SQL, if it makes any difference)


There is no really "elegant" way. When you have group Group By queries like this, you're going to have either sub-queries or temporary tables.

This will work:

Select ID, A.PersonID, A.StoreID, Data1, Data2, A.EntryDate
From ExampleTable As A
Inner Join
    (select PersonID, StoreID, max(EntryDate) As EntryDate
    from ExampleTable
    group by PersonID, StoreID, dbo.dayof(EntryDate)) As B
  On ExampleTable.PersonID = B.PersonID 
    And ExampleTable.StoreID = B.StoreID 
    And ExampleTable.EntryDate = B.EntryDate

You should not be too down on the solution you came up with though. Using temporary tables never looks elegant, but it is efficient; I would not be surprised if your original two-step solution is actually faster than my one-step solution. (you'll have to test to know for sure.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜