开发者

Linq duplicate removal with a twist

I got a list that contains al the status items of each order. The problem that i have is that i need to remove all the items of which the status -> logdate combination is not the highest.

e.g

        var inputs = new List<StatusItem>();
        //note that the 3th id is simply a modifier that adds that amount of secs
        //to the current datetime, to make testing easier
        inputs.Add(new StatusItem(123, 30, 1));
        inputs.Add(new StatusItem(123, 40, 2));
        inputs.Add(new StatusItem(123, 50, 3));
        inputs.Add(new StatusItem(123, 40, 4));
        inputs.Add(new StatusItem(123, 50, 5));

        inputs.Add(new StatusItem(100, 20, 6));
        inputs.Add(new StatusItem(100, 30, 7));
        inputs.Add(new StatusItem(100, 20, 8));
        inputs.Add(new StatusItem(100, 30, 9));
        inputs.Add(new StatusItem(开发者_Go百科100, 40, 10));
        inputs.Add(new StatusItem(100, 50, 11));
        inputs.Add(new StatusItem(100, 40, 12));

        var l = from i in inputs
                group i by i.internalId
                    into cg
                    select
                             from s in cg
                             group s by s.statusId
                                 into sg
                                 select sg.OrderByDescending(n => n.date).First()
                    ;

edit: for convenience im adding the class definition as well.

  public class StatusItem
  {
            public int internalId;
            public int statusId;
            public DateTime date;

            public StatusItem(int internalId, int statusId, int secMod)
            {
                this.internalId = internalId;
                this.statusId = statusId;
                date = DateTime.Now.AddSeconds(secMod);
            }
  } 

This creates a list that returnes me the following:

order 123 status 30 date 4/9/2010 6:44:21 PM

order 123 status 40 date 4/9/2010 6:44:24 PM

order 123 status 50 date 4/9/2010 6:44:25 PM

order 100 status 20 date 4/9/2010 6:44:28 PM

order 100 status 30 date 4/9/2010 6:44:29 PM

order 100 status 40 date 4/9/2010 6:44:32 PM

order 100 status 50 date 4/9/2010 6:44:31 PM

This is ALMOST correct. However that last line which has status 50 needs to be filtered out as well because it was overruled by status 40 in the historylist. U can tell by the fact that its date is lower then the "last" status-item with the status 40.

I was hoping someone could give me some pointers because im stuck.

Edit: Final complete solution:

  var k = from sg in
                    from i in inputs
                     group i by i.internalId
                         into cg
                         select
                                  from s in cg
                                  group s by s.statusId
                                      into sg
                                      select sg.OrderByDescending(n => n.date).First()
                from s in sg
                where s.date >= sg.Where(n => n.statusId <= s.statusId).Max(n => n.date)
                group s by s.internalId
                    into si
                    from x in si
                    select x;


Looks like you don't currently have anything performing the filtering you need for the date, so you'd need to do something about that.

Off hand, something like this would perform the additional filtering:

        var k = from sg in l
                from s in sg
                where s.date >= sg.Where(n => n.statusId <= s.statusId).Max(n => n.date)
                group s by s.internalId;

Haven't tested it, so the grouping may not be what you want, and the comparisons may be reversed, but something like that should filter. >= and <= instead of > or < should mean that the status will always be compared to itself and not have to deal with empty set in aggregate issues.


It's not exactly in the same form you have, but it does give the correct result. I made a status item class with i, j, and k properties. Not sure what names you used for them.

var keys = inputs.Select(
    input =>
        new { i = input.i, j = input.j })
.Distinct();

var maxes = keys.Select(
    ints =>
        inputs.First(
            input =>
                input.i == ints.i
             && input.j == ints.j
             && input.k == inputs.Where(
                               i =>
                                   i.i == ints.i
                                && i.j == ints.j
                            ).Select(i => i.k).Max()));
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜