开发者

Distinct by property of class with LINQ [duplicate]

This question already has answers here: LINQ's Distinct() on a particular property (23 answers) Closed 3 years ago.

I have a collection:

List<Car> cars = new List<Car>();

Cars are uniquely identified by their property CarCode.

I have three cars in the collection, and two with identical CarCodes.

How can I use LINQ to convert thi开发者_如何学编程s collection to Cars with unique CarCodes?


You can use grouping, and get the first car from each group:

List<Car> distinct =
  cars
  .GroupBy(car => car.CarCode)
  .Select(g => g.First())
  .ToList();


Use MoreLINQ, which has a DistinctBy method :)

IEnumerable<Car> distinctCars = cars.DistinctBy(car => car.CarCode);

(This is only for LINQ to Objects, mind you.)


Same approach as Guffa but as an extension method:

public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
    return items.GroupBy(property).Select(x => x.First());
}

Used as:

var uniqueCars = cars.DistinctBy(x => x.CarCode);


You can implement an IEqualityComparer and use that in your Distinct extension.

class CarEqualityComparer : IEqualityComparer<Car>
{
    #region IEqualityComparer<Car> Members

    public bool Equals(Car x, Car y)
    {
        return x.CarCode.Equals(y.CarCode);
    }

    public int GetHashCode(Car obj)
    {
        return obj.CarCode.GetHashCode();
    }

    #endregion
}

And then

var uniqueCars = cars.Distinct(new CarEqualityComparer());


Another extension method for Linq-to-Objects, without using GroupBy:

    /// <summary>
    /// Returns the set of items, made distinct by the selected value.
    /// </summary>
    /// <typeparam name="TSource">The type of the source.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="source">The source collection.</param>
    /// <param name="selector">A function that selects a value to determine unique results.</param>
    /// <returns>IEnumerable&lt;TSource&gt;.</returns>
    public static IEnumerable<TSource> Distinct<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
    {
        HashSet<TResult> set = new HashSet<TResult>();

        foreach(var item in source)
        {
            var selectedValue = selector(item);

            if (set.Add(selectedValue))
                yield return item;
        }
    }


I think the best option in Terms of performance (or in any terms) is to Distinct using the The IEqualityComparer interface.

Although implementing each time a new comparer for each class is cumbersome and produces boilerplate code.

So here is an extension method which produces a new IEqualityComparer on the fly for any class using reflection.

Usage:

var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();

Extension Method Code

public static class LinqExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
    {
        GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
        return items.Distinct(comparer);
    }   
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
    private Func<T, TKey> expr { get; set; }
    public GeneralPropertyComparer (Func<T, TKey> expr)
    {
        this.expr = expr;
    }
    public bool Equals(T left, T right)
    {
        var leftProp = expr.Invoke(left);
        var rightProp = expr.Invoke(right);
        if (leftProp == null && rightProp == null)
            return true;
        else if (leftProp == null ^ rightProp == null)
            return false;
        else
            return leftProp.Equals(rightProp);
    }
    public int GetHashCode(T obj)
    {
        var prop = expr.Invoke(obj);
        return (prop==null)? 0:prop.GetHashCode();
    }
}


You can't effectively use Distinct on a collection of objects (without additional work). I will explain why.

The documentation says:

It uses the default equality comparer, Default, to compare values.

For objects that means it uses the default equation method to compare objects (source). That is on their hash code. And since your objects don't implement the GetHashCode() and Equals methods, it will check on the reference of the object, which are not distinct.


Another way to accomplish the same thing...

List<Car> distinticBy = cars
    .Select(car => car.CarCode)
    .Distinct()
    .Select(code => cars.First(car => car.CarCode == code))
    .ToList();

It's possible to create an extension method to do this in a more generic way. It would be interesting if someone could evalute performance of this 'DistinctBy' against the GroupBy approach.


You can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;

This is how you use it:

using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜