Matching and ranking query with dynamic columns and percentage match
I'm trying to write a ranked match/searching system for a client that will look at the materials requested (MaterialRequest table) and find the providers (where userprofile.usertype_id = 1) who can provide the material(s) and rank the results that can provide the most, or all of the, materials. Here's the database schema i have:
Userprofile Table
userprofile_id int identity
userprofile_dt datetime
first_nm varchar(50)
last_nm varchar(50)
usertype_id int (provider = 1, requestor = 2)
Request Table
request_id int identity
request_dt datetime
title varchar(50)
description varchar(100)
userprofile_id int (where usertype = 2)
MaterialRequest Table
material_req_id int identity
request_id int
material_id int
MaterialProvider Table
material_pro_id int identity
userprofile_id int (where usertype = 1)
material_id int
Material Table
material_id int identity
material_desc varchar(50)
So, for example, if I have this request:
request_id = 1
request_dt = 3/28/2011
title = 'test request'
desc = null
userprofile_id = 100 (where usertype_id = 2)
and these materials were requested
material_req_id request_id material_id
1 1 10 (steel)
2 1 11 (copper)
3 1 12 (titanium)
4 1 13 (nickel)
and the MaterialProvider was populated like
material_pro_id userprofile_id material_id
1 2 10 (steel)
2 2 11 (copper)
3 2 13 (nickel)
4 3 11 (copper)
5 3 13 (nickel)
6开发者_JS百科 3 12 (titanium)
I would expect my output to look like
userprofile_id steel copper nickel titanium pct_match
2 Y Y Y N 75
3 N Y Y Y 75
where the column names are derived from the materials in the request. Then be able to find the providers that can provide more than a given percentage of the materials requested.
I had started with a temporary table and a cursor to
- add the columns to the temporary table
- then iterate through the 3000+ providers and add those providers that can provide the specified materials.
Is there a better way to do this? The process takes way too long and would like to get better/best practices on how to write something like this.
;WITH NormalOutput AS (
/* normal output: one material per row */
SELECT
p.userprofile_id,
m.material_desc,
value = CASE WHEN mp.material_pro_id IS NULL THEN 'N' ELSE 'Y' END
FROM Request r
INNER JOIN MaterialRequest mr ON r.request_id = mr.request_id
INNER JOIN Material m ON mr.material_id = m.material_id
CROSS JOIN (SELECT DISTINCT userprofile_id FROM MaterialProvider) p
LEFT JOIN MaterialProvider mp
ON p.userprofile_id = mp.userprofile_id AND mr.material_id = mp.material_id
WHERE r.request_id = 1
)
SELECT p.*, t.pct_match
FROM (
/* pivoting the normal output */
SELECT userprofile_id, steel, copper, titanium, nickel
FROM NormalOutput n
PIVOT (MAX(value) FOR material_desc IN (steel, copper, titanium, nickel)) p
) p
INNER JOIN (
/* aggregating the normal output (calculating percents) */
SELECT
userprofile_id,
pct_match = COUNT(CASE value WHEN 'Y' THEN value END) * 100 / COUNT(*)
FROM NormalOutput
GROUP BY userprofile_id
) t
/* joining the two modified outputs */
ON t.userprofile_id = p.userprofile_id
Do the pivot on the materials name last, after you've identified a set of target providers. Do all the math first, then the pretty formatting.
精彩评论