Make a usable Join relationship with LINQ on top of a database CSV design error
I'm looking for a way to fix and/or abstract away a comma-separated values (CSV) list in a database field in orde开发者_Go百科r to reconstruct a usable relationship such that I can properly join the two tables below and query them using C# LINQ and its .Join method.
Following is a sample showing the Person table and CsvArticleIds field having a CSV value to represent a one-to-many association with Article records.
TABLE [dbo].[Person]
Id Name CsvArticleIds -- ---------- -------- 1 Joe "15,22" 5 Ed "22" 10 Arnie "8,15,22"
^^^(Of course a link table should have been created; nonetheless the relationship with articles is trapped inside that list of CSV values.)
TABLE [dbo].[Article]
Id Title -- ---------- 8 Beginning C# 15 A Historic look at Programming in the 90s 22 Gardening in January
Additional Info
- the fix can be at any level: C#.NET or SQL Server
- something easy because I will be repeating the solution for many other CSV values in other tables.
- Elegant is nice too.
- not looking for efficiency because this is part of a one-time data migration task and can take as long as it wants to run.
I would fix this at the table level using SQL. I'd create a new table with the person Id and an article Id in it. After populating this new table, I'd drop the Person.CsvArticleIds column. You will then have a normalized table structure to store articles for people.
You'll need to split that CsvArticleIds string. There are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method:
"Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog
You need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(@Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers
that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
@SplitOn char(1) --REQUIRED, the character to split the @List string on
,@List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT @SplitOn + @List + @SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = @SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
ListValue
-----------------------
1
2
3
4
5
6777
(6 row(s) affected)
To make what you need work, use CROSS APPLY:
DECLARE @YourTable table (Id int, Name varchar(10), CsvArticleIds varchar(500))
INSERT @YourTable VALUES (1 ,'Joe' ,'15,22')
INSERT @YourTable VALUES (5 ,'Ed' ,'22')
INSERT @YourTable VALUES (10 ,'Arnie' ,'8,15,22')
DECLARE @YourTableNormalized table (Id int, ArticleId int)
INSERT INTO @YourTableNormalized
(Id, ArticleId)
SELECT
y.Id, st.ListValue
FROM @YourTable y
CROSS APPLY dbo.FN_ListToTable(',',y.CsvArticleIds) AS st
ORDER BY st.ListValue
SELECT * FROM @YourTableNormalized ORDER BY Id,ArticleId
OUTPUT:
Id ArticleId
----------- -----------
1 15
1 22
5 22
10 8
10 15
10 22
(6 row(s) affected)
transform the Person table into something more useful first, like
var newpersons =
data.Persons.Select(p => new
{
Id = p.Id,
Name = p.Name,
ArticleIds = p.CsvArticleIds.Substring(1, p.CsvArticleIds.Length -2).Split(',').ToList()
});
now you can join against the person.ArticleIds collection.
if holding the entire transformed Person table in memory can't be done, then use the same .Select to transform groups of records, pulling Person objects out of the DB, say 100 at a time, using Skip() and Take().
精彩评论