File IO duplicate checking more efficient?
Basically I have this application which scans through all mp3's in a folder and returns a list of files without duplicates. I have two methods to perform this task. The first removed duplicate file names and the second removes duplicate files with matching mp3 IDv3 tags.
However my folder has about 5000 files which it successfully removes duplicates to like 4900, but it takes forever! Can anyone suggest a more efficient method? I've used parallelism to make things as fast as possible but it's still dog slow.
First method to remove duplicate file names:
private static IEnumerable<string> GetFilesFromDir(开发者_运维技巧string dir)
{
return Directory.GetFiles(dir, "*.mp3", SearchOption.AllDirectories).Distinct();
}
The second method goes through each file returned from the above method and checks it's IDv3 tag (Artist - Song Title) information to ensure that duplicate songs are not present.
private static IEnumerable<string> RemoveDuplicates(IEnumerable<string> files)
{
var dictionary = new ConcurrentDictionary<string, string>();
Parallel.ForEach(files, f =>
{
string tag = SongInformation.ArtistTitleAlbumString(f);
dictionary.TryAdd(tag, f);
});
return dictionary.Values;
}
The two methods are called as follows:
var newFiles = RemoveDuplicates(GetFilesFromDir(Settings.SharedFolder));
the call to Distinct()
seems pointless here. Directory.GetFiles()
returns full file names (with path) so they are always distinct.
精彩评论