开发者

Why is this LINQ query not returning the correct dates?

The following LINQ query reads a delimited file and returns the most-recent record for each r开发者_开发百科ecordId. The problem is, the most-recent record is not always returned. What am I doing wrong? What do I need to change to ensure the most-recent date is always returned? Is there a better way than using .Max()?

I've also attached some sample data so you can see the issue. When looking at the sample data, the rows marked with an asterisk (*) are the rows I want returned (the most recent date). The rows marked with an X are what gets incorrectly, in my opinion, returned.

In cases where the same recordId appears multiple times (#162337, for example) and has multiple dates, I want one record returned with the most-recent date.

var recipients = File.ReadAllLines(path)
    .Select (record => record.Split('|'))
    .Select (tokens => new 
        {
        FirstName = tokens[2],
        LastName = tokens[4],
        recordId = Convert.ToInt32(tokens[13]),
        date = Convert.ToDateTime(tokens[17])
        }
    )
    .GroupBy (m => m.recordId)
    .OrderByDescending (m => m.Max (x => x.date ) )
    .Select (m => m.First () )
    .OrderBy (m => m.recordId )

    .Dump();


FirstName   LastName    recordId    date    
fname   lname   137308  2/15/1991 0:00  
fname   lname   138011  6/16/1983 0:00  *
fname   lname   138011  11/9/1981 0:00  x
fname   lname   158680  9/4/1986 0:00   
fname   lname   161775  4/23/1991 0:00  
fname   lname   162337  12/1/1998 0:00  *
fname   lname   162337  12/1/1998 0:00  *
fname   lname   162337  9/1/1994 0:00   x
fname   lname   162337  9/1/1994 0:00   x
fname   lname   163254  2/12/1969 0:00  
fname   lname   173816  9/26/1997 0:00  
fname   lname   178063  1/16/1980 0:00  *
fname   lname   178063  3/3/1976 0:00   x
fname   lname   180725  7/1/2007 0:00   *
fname   lname   180725  1/14/1992 0:00  x
fname   lname   181153  5/1/2001 0:00   


You're ordering the entire sequence of groups by the maximum date within each group. What you need to do is order within each individual group so that only the item with the maximum date is selected.

var recipients = File.ReadAllLines(path)
                     .Select(record => record.Split('|'))
                     .Select(tokens => new
                         {
                             FirstName = tokens[2],
                             LastName = tokens[4],
                             recordId = Convert.ToInt32(tokens[13]),
                             date = Convert.ToDateTime(tokens[17])
                         })
                     .GroupBy(m => m.recordId,
                              (k, g) => g.OrderByDescending(m => m.date).First())
                     .OrderBy(m => m.recordId);

If performance is important and each group could potentially contain many items then you might see a slight improvement if you use Aggregate to determine the max record in the group rather than the OrderByDescending/First combo:

// ...
.GroupBy(m => m.recordId,
         (k, g) => g.Aggregate((a, m) => (m.date > a.date) ? m : a))
// ...


Is it possible that this line:

.OrderByDescending (m => m.Max (x => x.date ) )

is sorting the groups by what their max date is, rather than the items in each group?

This trimmed-down code segment appears to produce the results you're looking for (though you'd have to work it in with your file-processing, obviously)

        List<Customer> Customers = new List<Customer>() {
            new Customer(){ RecordId = 12, Birthday = new DateTime(1970, 1, 1)},
            new Customer(){ RecordId = 12, Birthday = new DateTime(1982, 3, 22)},
            new Customer(){ RecordId = 12, Birthday = new DateTime(1990, 1, 1)},

            new Customer(){ RecordId = 14, Birthday = new DateTime(1960, 1, 1)},
            new Customer(){ RecordId = 14, Birthday = new DateTime(1990, 5, 15)},
        };

        var groups = Customers.GroupBy(c => c.RecordId);
        IEnumerable<Customer> itemsFromGroupWithMaxDate = groups.Select(g => g.OrderByDescending(c => c.Birthday).First());

        foreach(Customer C in itemsFromGroupWithMaxDate)
            Console.WriteLine(String.Format("{0} {1}", C.RecordId, C.Birthday));

Or better yet:

IEnumerable<Customer> itemsFromGroupWithMaxDate = Customers.GroupBy(c => c.RecordId).Select(g => g.OrderByDescending(c => c.Birthday).First());

Taking a blind stab at your code, I believe this might work:

var recipients = File.ReadAllLines(path)
    .Select (record => record.Split('|'))
    .Select (tokens => new 
        {
        FirstName = tokens[2],
        LastName = tokens[4],
        recordId = Convert.ToInt32(tokens[13]),
        date = Convert.ToDateTime(tokens[17])
        }
    )
    .GroupBy (m => m.recordId)
    .Select(m => OrderByDescending(x => x.date).First())
    .OrderBy (m => m.recordId )

    .Dump();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜