Normalize data with LINQ
Assume we have some denormalized data, like this:
List<string[]> dataSource = new List<string[]>();
string [] row1 = {"grandParentTitle1", "parentTitle1", "childTitle1"};
string [] row2 开发者_运维知识库= {"grandParentTitle1", "parentTitle1", "childTitle2"};
string [] row3 = {"grandParentTitle1", "parentTitle2", "childTitle3"};
string [] row4 = {"grandParentTitle1", "parentTitle2", "childTitle4"};
dataSource.Add(row1);
I need to normalize it, e.g. to get IEnumerable< Child > with Child.Parent and Child.Parent.GrandParent filled.
Imperative way is more or less clear. Will it be shorter with Linq?
Better in one query, and this should be expandable for more entities.
I tried something like separately create IEnumerable< GrandParent >, then IEnumerable< Parent > with assigning etc.
PLease make a hint could this be achieved in a functional way?
You can do exactly what you want using group by. Unfortunately my knowledge of the C# LINQ syntax is limited, so I just can show you the way calling extension method GroupBy.
var normalized = dataSource
.GroupBy(source => source[0], (grandParent, grandParentChilds) => new { GrandParent = grandParent, Parents = grandParentChilds
.GroupBy(source => source[1], (parent, parentChilds) => new { Parent = parent, Children = from source in parentChilds select source[2]}) });
foreach (var grandParent in normalized)
{
Console.WriteLine("GrandParent: {0}", grandParent.GrandParent);
foreach (var parent in grandParent.Parents)
{
Console.WriteLine("\tParent: {0}", parent.Parent);
foreach (string child in parent.Children)
Console.WriteLine("\t\tChild: {0}", child);
}
}
Linq really does the opposite of this. ie. If you had it normalised, you could easily say
from g in grandParents
from p in g.Parents
from c in p.Children
select new { GrandParentName = g.Name, ParentName = p.Name, ChildName = c.Name };
To do what you're asking is more tricky. Something like this
var grandparents = (from g in dataSource
select new GrandParent {
Title = g[0],
Parents = (from p in dataSource
where p[0] == g[0]
select new Parent {
Title = p[1],
Children = from c in dataSource
where p[1] == c[1]
select new
{
Title = c[2]
}
}).Distinct(new ParentTitleComparer())
}).Distinct(new GrandParentTitleComparer());
I'm not convinced this reads better than the imperative version would.
The most basic way of doing this would be with anonymous variables:
from ds0 in dataSource group ds0 by ds0[0] into grandparents
select new
{
Grandparent = grandparents.Key,
Parents =
from ds1 in grandparents group ds1 by ds1[1] into parents
select new
{
Parent = parents.Key,
Children = from ds2 in parents select ds2[2]
}
};
If you wanted to do this with concrete classes I would suggest creating a Person
class with a constructor that takes an IEnumerable<Person>
representing the children of the Person
being constructed. Then you could do this:
from ds0 in dataSource
group ds0 by ds0[0] into grandparents
select new Person(grandparents.Key,
from ds1 in grandparents
group ds1 by ds1[1] into parents
select new Person(parents.Key,
from ds2 in parents
select new Person(ds2[2])));
Do either of these solutions work for you?
If you want different GrandParent
, Parent
& Child
types then you should be able to modify the last example to suit.
精彩评论