SQL Query - one column must be distinct, restricted column must be most recent
I have a CATEGORIES table that links to an ITEMS table (one to many). Each item table could have multiple barcodes in the BARCODES table (one to many).
The idea is that a new barcode may be added against an item at some point, but the old data is stored in the BARCODES table so that if a search is done with an order with last year's data, the item and category of the old barcode can still be found quickly.
Hope that makes sense - it's a bit of an oversimplification.
Example of my tables:
CATEGORIES TABLE
ID NAME CODE ACCOUNTING_REFERENCE
1 Beverages BEV Stock_Beverages
2 Pies PIE Stock_Pies
3 Chips CHP Stock_Chips
ITEMS TABLE
ID CATEGORY_ID DESCRIPTION BASE_COST
1 1 Red Bull (single) $4.50
2 2 Ponsonby Pie - Mince Cheese $2.99
3 1 Coke Can (single) $3.50
4 2 Big Ben - Steak Pepper $1.99
BARCODES TABLE
ID ITEM_ID BARCODE ACTIVE_FROM
1 1 XSD123 2009/10/11
2 2 AXF123 2009/10/12
3 3 XYZ234 2009/10/11
4 1 NEW001 2010/01/05
5 1 NEW002 2010/01/05*
- I know it doesn't make sense in this scenario to have two barcodes entered on the same day. Ignore that this looks like bad data. This example is just an analogy for what I'm actually doing - in the real world application that mirrors this structure, it makes sense to have two 'BARCODES' entered on the same day. I'd like to discuss the real data openly, but I'm not free to do so - I'm sorry if it's confusing.
I am attempting to display item information to the user. I want to return from SQL one record per item, for all items. The returned information should include category name, code, and accounting references for the lines - duplicates of these columns are fine. The query needs to return item description and base cost.
This much I can do.
I also want the query to return the most recent barcode for the item. In the cases where there are two barcodes entered for an item on the same day, it isn't hugely important which one I display - although displaying the higher of the two (or more) BARCODE.ID fields would be a nice touch.
Also, it is possible that items may exist without any barcodes against them, in which case I want to return a null entry.
My desired result set would look something like this:
ITEM_ID DESCRIPTION BASE_COST CATEGORY_NAME CATEGORY_CODE ACCOUNTING_REFERENCE BARCODE
1 Red Bull (single) $4.50 Beverages BEV Stock_Beverages NEW002
2 Ponsonby Pie - Mince Cheese $2.99 Pies PIE Stock_Pies AXF123
3 Coke Can (single) $3.50 Beverages BEV Stock_Beverages XYZ234
4 Big Ben - Steak Pepper $1.99 Pies PIE Stock_Pies <null>
I can't work out how to add the BARCODE column to this result set, given the tables above.
In case I haven't been clear: I need to know how to structure a SQL query that will give me exactly the result above, given the data I've presented.
It's a requirement that the ITEM_ID column in my result set be distinct. I can't just restrict the stale entries in memory, and if I bubble up multiple ITEM_ID's it breaks a PK/FK relationship used elsewhere in the application I'm touching up.
For the record, I've had a look around the site on anything related to distinct sql columns to answer this question. I found a lot of entries, but co开发者_C百科uldn't seem to get any of the suggested solutions to work for me. Perhaps I'm just dense - it's getting late, and I may not be thinking straight. Apologies if I've missed something obvious.
EDIT
Gabe gave a good answer below, and I realized I should have been clearer.
Gabe's answer doesn't work for me, because occasionally in my data I have two barcodes against the same item with the same timestamp. When I tried his code, I wound up with multiple items being returned whenever an item had two barcodes placed against it on the same day.
This is really counter-intuitive, so it's my fault for not communicating properly. Essentially, the scenario described above isn't actually the data I'm working from. I'm not at liberty to discuss the database I'm working on publicly, so I have to rename everything in the examples I use.
I've adjusted the problem above. I know it seems ridiculous to have two barcodes entered with the same timestamp - rest assured that it makes sense that this could happen in the actual database.
EDIT (again)
The answers of both Gabe and simon work. I chose Gabe's as the answer, simply because I found his statement the most legible.
That said, I also like simon's because it shows a syntax for using SQL that I'm not familiar with - it's good to see an unfamiliar syntax in action.
At some point I need to benchmark the two methods to see which is faster. With my sample data, they're currently equal - although I expect I could change that with a bit of work populating my database with a few thousand more records and revising my indexes.
Thanks for the help.
It sounds like you want a correlated subquery. Roughly, this:
select *
from ITEMS
join BARCODES as B on ITEMS.ID = B.ITEM_ID
join CATEGORIES on CATEGORIES.ID = ITEMS.CATEGORY_ID
where B.ACTIVE_FROM =
(select max(ACTIVE_FROM)
from BARCODES as B2
where B2.ITEM_ID = B.ITEM_ID)
To handle the situation where there is no barcode for a given item you need an outer join and to give only one barcode when there are many you need an additional subquery. In ANSI SQL, it might look something like this:
select *
from ITEMS
LEFT OUTER join BARCODES as B on ITEMS.ID = B.ITEM_ID
join CATEGORIES on CATEGORIES.ID = ITEMS.CATEGORY_ID
where B.ACTIVE_FROM =
(select max(ACTIVE_FROM)
from BARCODES as B2
where B2.ITEM_ID = B.ITEM_ID)
AND B.ID =
(SELECT MAX(ID)
FROM BARCODES AS B3
WHERE B3.ITEM_ID = B.ITEM_ID)
OR ACTIVE_FROM IS NULL
Try something like this:
use tempdb
go
if exists (select 1 from sys.objects where name = 'barcodes')
drop table barcodes
if exists (select 1 from sys.objects where name = 'items')
drop table items
if exists (select 1 from sys.objects where name = 'categories')
drop table categories
go
create table categories (
id int primary key,
[name] nvarchar(30),
code char(3),
accounting_reference nvarchar(30)
)
create table items (
id int primary key,
category_id int foreign key references categories (id),
description nvarchar(50),
base_cost money
)
create table barcodes (
id int primary key,
item_id int foreign key references items (id),
barcode varchar(10),
active_from datetime
)
go
insert into categories (id, [name], code, accounting_reference)
select 1, 'Beverages', 'BEV', 'Stock_Beverages' union all
select 2, 'Pies', 'PIE', 'Stock_Pies' union all
select 3, 'Chips', 'CHP', 'Stock_Chips'
insert into items (id, category_id, description, base_cost)
select 1, 1, 'Red Bull (single)', 4.5 union all
select 2, 2, 'Ponsonby Pie - Mince Cheese', 2.99 union all
select 3, 1, 'Coke Can (single)', 3.50 union all
select 4, 2, 'Big Ben - Steak Pepper', 1.99
insert into barcodes (id, item_id, barcode, active_from)
select 1, 1, 'XSD123', '2009/10/11' union all
select 2, 2, 'AXF123', '2009/10/12' union all
select 3, 3, 'XYZ234', '2009/10/11' union all
select 4, 1, 'NEW001', '2010/01/05' union all
select 5, 1, 'NEW002', '2010/01/05'
;with x as (
select item_id, max(active_from) active_from, max(id) id
from barcodes
group by item_id
),
y as (
select item_id, barcode
from barcodes
where exists (select 1 from x where item_id = barcodes.item_id and id = barcodes.id and active_from = barcodes.active_from)
)
select t1.id item_id, t1.description, t1.base_cost, t2.name category_name, t2.code category_code, t2.accounting_reference, t3.barcode
from items t1 left join categories t2 on (t1.category_id = t2.id)
left join y t3 on (t1.id = t3.item_id)
精彩评论