Filtering an array with duplicate elements
I have an array of FileInfo objects with duplicate elements I'd like to filter, i.e.e remove duplicates, the eleme开发者_如何转开发nts are sorted by last write time using a custom comparer. The format of the file names is as follows:
file{number}{YYYMMDD}{HHMMSS}.txt
What I'd like to know is if there's an elegant way of filtering out two files with the same file number so that only the most recent is present in the list, i.e. I have two elements in my array with the following file names:
file1_20110214_090020.txt
file1_20101214_090020.txt
I would like to keep the most recent version of file1. The code I have for getting the files is as follows:
FileInfo[] listOfFiles = diSearch.GetFiles(fileSearch);
IComparer compare = new FileComparer(FileComparer.CompareBy.LastWriteTime);
Array.Sort(listOfFiles, compare);
Thanks for your help.
UPDATE:
Forgot to add the caveat, the program in question is using .Net 2.0, so no LINQ unfortunately. Sorry for the confusion, above I corrected the file number to be the same
With LINQ, you could do:
var listOfFiles = diSearch
.GetFiles(fileSearch)
.GroupBy(file => file.Name.Substring(file.Name.IndexOf('_')))
.Select(g => g.OrderBy(file => file.LastWriteTime).Last())
.ToArray();
If you want these files to also be ordered by last write-time, put in a .OrderByDescending(file => file.LastWriteTime)
before the ToArray
call.
You could of course use a more efficient technique to find the latest file from each group, such as with a MaxBy
operator.
EDIT:
In .NET 2.0, you could construct a Dictionary<string, List<FileInfo>>
(with the key being the 'file-group') from the array, and then extract the latest file from each list of the dictionary's Values
collection, to produce the result.
If you are on C# 3 or later, another option would be to use LINQBridge, which lets you use LINQ to Objects while targeting .NET 2.0.
If I understand you correctly you want to determine the most recent file as determined by its file name (YYYYMMM and so on) not by the last write time and grouped by file id write time. In this case this would work:
var mostRecentFiles = listOfFiles.GroupBy( f => f.Name.Substring(0, f.Name.IndexOf("_")))
.Select( g => g.OrderByDescending( f =>
{ string[] s =f.Name.Split(new [] {'_', '.'}); return Convert.ToDecimal(s[1]+s[2]);}).First())
.ToList();
精彩评论