开发者

Filtering duplicates out of an IEnumerable

I have this code:

class MyObj {
    int Id;
    string Name;
    string Location;
}

IEnumerable<MyObj> list;

I want to convert list to a dictionary like this:

list.ToDictionary(x => x.Name);

but it tells me I have duplicate keys. How can I keep 开发者_运维问答only the first item for each key?


I suppose the easiest way would be to group by key and take the first element of each group:

list.GroupBy(x => x.name).Select(g => g.First()).ToDictionary(x => x.name);

Or you could use Distinct if your objects implement IEquatable to compare between themselves by key:

// I'll just randomly call your object Person for this example.
class Person : IEquatable<Person> 
{
    public string Name { get; set; }

    public bool Equals(Person other)
    {
        if (other == null)
            return false;

        return Name == other.Name;
    }

    public override bool Equals(object obj)
    {
        return base.Equals(obj as Person);
    }

    public override int GetHashCode()
    {
        return Name.GetHashCode();
    }
}

...

list.Distinct().ToDictionary(x => x.Name);

Or if you don't want to do that (maybe because you normally want to compare for equality in a different way, so Equals is already in use) you could make a custom implementation of IEqualityComparer just for this case:

class PersonComparer : IEqualityComparer<Person>
{
    public bool Equals(Person x, Person y)
    {
        if (x == null)
            return y == null;

        if (y == null)
            return false;

        return x.Name == y.Name;
    }

    public int GetHashCode(Person obj)
    {
        return obj.Name.GetHashCode();
    }
}

...

list.Distinct(new PersonComparer()).ToDictionary(x => x.Name);


list.Distinct().ToDictionary(x => x.Name);


You could also create your own Distinct extension overload method that accepted a Func<> for choosing the distinct key:

public static class EnumerationExtensions
{
    public static IEnumerable<TSource> Distinct<TSource,TKey>(
        this IEnumerable<TSource> source, Func<TSource,TKey> keySelector)
    {
        KeyComparer comparer = new KeyComparer(keySelector);

        return source.Distinct(comparer);
    }

    private class KeyComparer<TSource,TKey> : IEqualityComparer<TSource>
    {
        private Func<TSource,TKey> keySelector;

        public DelegatedComparer(Func<TSource,TKey> keySelector)
        {
            this.keySelector = keySelector;
        }

        bool IEqualityComparer.Equals(TSource a, TSource b)
        {
            if (a == null && b == null) return true;
            if (a == null || b == null) return false;

            return keySelector(a) == keySelector(b);
        }

        int IEqualityComparer.GetHashCode(TSource obj)
        {
            return keySelector(obj).GetHashCode();
        }
    }
}

Apologies for any bad code formatting, I wanted to reduce the size of the code on the page. Anyway, you can then use ToDictionary:

 var dictionary = list.Distinct(x => x.Name).ToDictionary(x => x.Name);


Could make your own perhaps? For example:

public static class Extensions
{
    public static IDictionary<TKey, TValue> ToDictionary2<TKey, TValue>(
        this IEnumerable<TValue> subjects, Func<TValue, TKey> keySelector)
    {
        var dictionary = new Dictionary<TKey, TValue>();
        foreach(var subject in subjects)
        {
            var key = keySelector(subject);
            if(!dictionary.ContainsKey(key))
                dictionary.Add(key, subject);
        }
        return dictionary;
    }
}

var dictionary = list.ToDictionary2(x => x.Name);

Haven't tested it, but should work. (and it should probably have a better name than ToDictionary2 :p)

Alternatively, you can implement a DistinctBy method, for example like this:

public static IEnumerable<TSubject> DistinctBy<TSubject, TValue>(this IEnumerable<TSubject> subjects, Func<TSubject, TValue> valueSelector)
{
    var set = new HashSet<TValue>();
    foreach(var subject in subjects)
        if(set.Add(valueSelector(subject)))
            yield return subject;
}

var dictionary = list.DistinctBy(x => x.Name).ToDictionary(x => x.Name);


The problem here is that the ToDictionary extension method does not support multiple values with the same key. One solution is to write a version which does and use that instead.

public static Dictionary<TKey,TValue> ToDictionaryAllowDuplicateKeys<TKey,TValue>(
  this IEnumerable<TValue> values,
  Func<TValue,TKey> keyFunc) {
  var map = new Dictionary<TKey,TValue>();
  foreach ( var cur in values ) {
    var key = keyFunc(cur);
    map[key] = cur;
  }
  return map;
}

Now converting to a dictionary is straight forward

var map = list.ToDictionaryAllowDuplicateKeys(x => x.Name);


The following will work if you have different instances of MyObj with the same value for the Name property. It will take the first instance found for each duplicate (sorry for the obj - obj2 notation, it is just sample code):

list.SelectMany(obj => new MyObj[] {list.Where(obj2 => obj2.Name == obj.Name).First()}).Distinct();

EDIT: Joren's solution is better as it does not create unnecessary arrays in the process.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜