How to JOIN to a table that has multiple values in the column?
I've got a person table that contains an error code field that can contain multiple error codes (001, 002, 003...). I know that's a schema problem but this is a vendor application and I have no control over the schema, so I have to work with what I've got.
There is also a Error table that contains ErrorCode (char(3)) and Descript (char(1000)). In my query the Person.ErrorCode is joined to the Error.ErrorCode to get the value of the corresponding description.
For person records where there is only one error code, I can get the corresponding Descript with no problem. What I'm trying to do is somehow concat the Descript values for records where there are multiple errors.
For example, here's some sample data from Error table:
ErrorCode Descript
001 Problem with person file
002 Problem with address file
003 Problem with grade
Here are the columns resulting from my SELECT on Person with a JOIN on E开发者_StackOverflow中文版rror:
Person.RecID Person.ErrorCode Error.Descript
12345 001 Problem with person file
12346 003 Problem with grade
12347 002,003
What I'm trying to get is this:
Person.RecID Person.ErrorCode Error.Descript
12345 001 Problem with person file
12346 003 Problem with grade
12347 002,003 Problem with address file, Problem with grade
Suggestions appreciated!
You should see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used to join rows:
SELECT
*
FROM dbo.yourSplitFunction(@Parameter) b
INNER JOIN YourCodesTable c ON b.ListValue=c.CodeValue
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers
that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
@SplitOn char(1) --REQUIRED, the character to split the @List string on
,@List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT @SplitOn + @List + @SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = @SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
DECLARE @ErrorCode table (ErrorCode varchar(20), Description varchar(30))
INSERT @ErrorCode VALUES ('001','Problem with person file')
INSERT @ErrorCode VALUES ('002','Problem with address file')
INSERT @ErrorCode VALUES ('003','Problem with grade')
DECLARE @Person table (RecID int, ErrorCode varchar(20))
INSERT @Person VALUES (12345 ,'001' )
INSERT @Person VALUES (12346 ,'003' )
INSERT @Person VALUES (12347 ,'002,003')
SELECT
p.RecID,c.ListValue,e.Description
FROM @Person p
CROSS APPLY dbo.FN_ListToTable(',',p.ErrorCode) c
INNER JOIN @ErrorCode e ON c.ListValue=e.ErrorCode
OUTPUT:
RecID ListValue Description
----------- ------------- -------------------------
12345 001 Problem with person file
12346 003 Problem with grade
12347 002 Problem with address file
12347 003 Problem with grade
(4 row(s) affected)
you can use the XML trick to concatenate the rows back together:
SELECT
t1.RecID,t1.ErrorCode
,STUFF(
(SELECT
', ' + e.Description
FROM @Person p
CROSS APPLY dbo.FN_ListToTable(',',p.ErrorCode) c
INNER JOIN @ErrorCode e ON c.ListValue=e.ErrorCode
WHERE t1.RecID=p.RecID
ORDER BY p.ErrorCode
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM @Person t1
GROUP BY t1.RecID,t1.ErrorCode
OUTPUT:
RecID ErrorCode ChildValues
----------- -------------------- -----------------------------------------------
12345 001 Problem with person file
12346 003 Problem with grade
12347 002,003 Problem with address file, Problem with grade
(3 row(s) affected)
This returns the same result set as above, but may perform better:
SELECT
t1.RecID,t1.ErrorCode
,STUFF(
(SELECT
', ' + e.Description
FROM (SELECT ListValue FROM dbo.FN_ListToTable(',',t1.ErrorCode)) c
INNER JOIN @ErrorCode e ON c.ListValue=e.ErrorCode
ORDER BY c.ListValue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM @Person t1
GROUP BY t1.RecID,t1.ErrorCode
Denormalize person.errorcode before the join with error.errorcode
I don't mean denormalize on the table level, I mean with a view or sql code.
You can use a Common Table Expression to pretend that the person table is normal:
;WITH PersonPrime as (
SELECT RecID,ErrorCode,CAST(null as varchar(100)) as Remain from Person where Value not like '%,%'
UNION ALL
SELECT RecID,SUBSTRING(ErrorCode,1,CHARINDEX(',',ErrorCode)-1),SUBSTRING(ErrorCode,CHARINDEX(',',ErrorCode)+1,100) from Person where Value like '%,%'
UNION ALL
SELECT RecID,Remain,null FROM PersonPrime where Remain not like '%,%'
UNION ALL
SELECT RecID,SUBSTRING(Remain,1,CHARINDEX(',',Remain)-1),SUBSTRING(Remain,CHARINDEX(',',Remain)+1,100) from PersonPrime where Remain like '%,%'
)
SELECT RecID,ErrorCode from PersonPrime
And now use PersonPrime where you'd have used Person in your original query. You'll want to CAST that null to a varchar column as wide as ErrorCode in the Person table
Concatenating the error descriptions might not be the way to go. It adds unnecessary complexity to the SQL statement that will be extremely problematic to debug. Any future additions or changes to the SQL will also be difficult. Your best bet is to generate a resultset that is normalized, even though your schema is not.
SELECT Person.RecID, Person.ErrorCode, Error.ErrorCode, Error.Descript
FROM Person INNER JOIN Error
ON REPLACE(Person.ErrorCode, ' ', '') LIKE '%,' + CONVERT(VARCHAR,Error.ErrorCode) + ',%'
If a person has multiple error codes set, then this will return one row for each error specified (ignoring duplicates). Using your example, it will return this.
Person.RecID Person.ErrorCode Error.ErrorCode Error.Descript
12345 001 001 Problem with person file
12346 003 003 Problem with grade
12347 002,003 002 Problem with address file
12347 002,003 003 Problem with grade
By grouping the errors together and concatenating them is an option:
SELECT *, GROUP_CONCAT(Person.ErrorCode) FROM Person
GROUP BY Person.RecID
精彩评论