Why is this LINQ query not returning the correct dates?
The following LINQ query reads a delimited file and returns the most-recent record for each r开发者_开发百科ecordId. The problem is, the most-recent record is not always returned. What am I doing wrong? What do I need to change to ensure the most-recent date is always returned? Is there a better way than using .Max()?
I've also attached some sample data so you can see the issue. When looking at the sample data, the rows marked with an asterisk (*) are the rows I want returned (the most recent date). The rows marked with an X are what gets incorrectly, in my opinion, returned.
In cases where the same recordId appears multiple times (#162337, for example) and has multiple dates, I want one record returned with the most-recent date.
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = Convert.ToInt32(tokens[13]),
date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy (m => m.recordId)
.OrderByDescending (m => m.Max (x => x.date ) )
.Select (m => m.First () )
.OrderBy (m => m.recordId )
.Dump();
FirstName LastName recordId date
fname lname 137308 2/15/1991 0:00
fname lname 138011 6/16/1983 0:00 *
fname lname 138011 11/9/1981 0:00 x
fname lname 158680 9/4/1986 0:00
fname lname 161775 4/23/1991 0:00
fname lname 162337 12/1/1998 0:00 *
fname lname 162337 12/1/1998 0:00 *
fname lname 162337 9/1/1994 0:00 x
fname lname 162337 9/1/1994 0:00 x
fname lname 163254 2/12/1969 0:00
fname lname 173816 9/26/1997 0:00
fname lname 178063 1/16/1980 0:00 *
fname lname 178063 3/3/1976 0:00 x
fname lname 180725 7/1/2007 0:00 *
fname lname 180725 1/14/1992 0:00 x
fname lname 181153 5/1/2001 0:00
You're ordering the entire sequence of groups by the maximum date within each group. What you need to do is order within each individual group so that only the item with the maximum date is selected.
var recipients = File.ReadAllLines(path)
.Select(record => record.Split('|'))
.Select(tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = Convert.ToInt32(tokens[13]),
date = Convert.ToDateTime(tokens[17])
})
.GroupBy(m => m.recordId,
(k, g) => g.OrderByDescending(m => m.date).First())
.OrderBy(m => m.recordId);
If performance is important and each group could potentially contain many items then you might see a slight improvement if you use Aggregate
to determine the max record in the group rather than the OrderByDescending
/First
combo:
// ...
.GroupBy(m => m.recordId,
(k, g) => g.Aggregate((a, m) => (m.date > a.date) ? m : a))
// ...
Is it possible that this line:
.OrderByDescending (m => m.Max (x => x.date ) )
is sorting the groups by what their max date is, rather than the items in each group?
This trimmed-down code segment appears to produce the results you're looking for (though you'd have to work it in with your file-processing, obviously)
List<Customer> Customers = new List<Customer>() {
new Customer(){ RecordId = 12, Birthday = new DateTime(1970, 1, 1)},
new Customer(){ RecordId = 12, Birthday = new DateTime(1982, 3, 22)},
new Customer(){ RecordId = 12, Birthday = new DateTime(1990, 1, 1)},
new Customer(){ RecordId = 14, Birthday = new DateTime(1960, 1, 1)},
new Customer(){ RecordId = 14, Birthday = new DateTime(1990, 5, 15)},
};
var groups = Customers.GroupBy(c => c.RecordId);
IEnumerable<Customer> itemsFromGroupWithMaxDate = groups.Select(g => g.OrderByDescending(c => c.Birthday).First());
foreach(Customer C in itemsFromGroupWithMaxDate)
Console.WriteLine(String.Format("{0} {1}", C.RecordId, C.Birthday));
Or better yet:
IEnumerable<Customer> itemsFromGroupWithMaxDate = Customers.GroupBy(c => c.RecordId).Select(g => g.OrderByDescending(c => c.Birthday).First());
Taking a blind stab at your code, I believe this might work:
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = Convert.ToInt32(tokens[13]),
date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy (m => m.recordId)
.Select(m => OrderByDescending(x => x.date).First())
.OrderBy (m => m.recordId )
.Dump();
精彩评论