开发者

Normalize data with LINQ

Assume we have some denormalized data, like this:

List<string[]> dataSource = new List<string[]>();
string [] row1 = {"grandParentTitle1", "parentTitle1", "childTitle1"}; 
string [] row2 开发者_运维知识库= {"grandParentTitle1", "parentTitle1", "childTitle2"};
string [] row3 = {"grandParentTitle1", "parentTitle2", "childTitle3"};
string [] row4 = {"grandParentTitle1", "parentTitle2", "childTitle4"};
dataSource.Add(row1);

I need to normalize it, e.g. to get IEnumerable< Child > with Child.Parent and Child.Parent.GrandParent filled.

Imperative way is more or less clear. Will it be shorter with Linq?

Better in one query, and this should be expandable for more entities.

I tried something like separately create IEnumerable< GrandParent >, then IEnumerable< Parent > with assigning etc.

PLease make a hint could this be achieved in a functional way?


You can do exactly what you want using group by. Unfortunately my knowledge of the C# LINQ syntax is limited, so I just can show you the way calling extension method GroupBy.

var normalized = dataSource
    .GroupBy(source => source[0], (grandParent, grandParentChilds) => new { GrandParent = grandParent, Parents = grandParentChilds
        .GroupBy(source => source[1], (parent, parentChilds) => new { Parent = parent, Children = from source in parentChilds select source[2]}) });

foreach (var grandParent in normalized)
{
    Console.WriteLine("GrandParent: {0}", grandParent.GrandParent);
    foreach (var parent in grandParent.Parents)
    {
        Console.WriteLine("\tParent: {0}", parent.Parent);
        foreach (string child in parent.Children)
            Console.WriteLine("\t\tChild: {0}", child);
    }
}


Linq really does the opposite of this. ie. If you had it normalised, you could easily say

from g in grandParents
from p in g.Parents
from c in p.Children
select new { GrandParentName = g.Name, ParentName = p.Name, ChildName = c.Name };

To do what you're asking is more tricky. Something like this

var grandparents = (from g in dataSource
                    select new GrandParent {
                        Title = g[0],
                        Parents = (from p in dataSource
                                   where p[0] == g[0]
                                   select new Parent {
                                      Title = p[1],
                                      Children = from c in dataSource
                                                 where p[1] == c[1]
                                                 select new
                                                            {
                                                                Title = c[2]
                                                            }
                                   }).Distinct(new ParentTitleComparer())
                    }).Distinct(new GrandParentTitleComparer());

I'm not convinced this reads better than the imperative version would.


The most basic way of doing this would be with anonymous variables:

from ds0 in dataSource group ds0 by ds0[0] into grandparents
select new
{
    Grandparent = grandparents.Key,
    Parents =
        from ds1 in grandparents group ds1 by ds1[1] into parents
        select new
        {
            Parent = parents.Key, 
            Children = from ds2 in parents select ds2[2]
        }
};

If you wanted to do this with concrete classes I would suggest creating a Person class with a constructor that takes an IEnumerable<Person> representing the children of the Person being constructed. Then you could do this:

from ds0 in dataSource
group ds0 by ds0[0] into grandparents
select new Person(grandparents.Key,
    from ds1 in grandparents
    group ds1 by ds1[1] into parents
    select new Person(parents.Key,
        from ds2 in parents
        select new Person(ds2[2])));

Do either of these solutions work for you?

If you want different GrandParent, Parent & Child types then you should be able to modify the last example to suit.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜