开发者

Remove duplicates using linq

I know this as asked many times but cannot see something that works. I am reading a csv file and then I have to remove duplicate lines based on one of the columns "CustomerID". Basically the CSV file can have multiple lines with the same customerID.

I need to remove the duplicates.

    //DOES NOT WORK
     var finalCustomerList = csvCustomerList.Distinct().ToList();  

     I have also tried this extension method //DOES NOT WORK
     public static IEnumerable<t> RemoveDuplicates<t>(this IEnumerable<t> items)
        {
        return new HashSet<t>(items);
        }

What works for me is

  • I Read the CSV file into a csvCustomerList
  • Loop through csvCustomerList and check if a customerExists If it doesnt I add it.

     foreach (var csvCustomer in csvCustomerL开发者_StackOverflow社区ist)
     {
        var Customer = new customer();
        customer.CustomerID = csvCustomer.CustomerID;
        customer.Name = csvCustomer.Name; 
        //etc.....
    
        var exists = finalCustomerList.Exists(x => x.CustomerID == csvCustomer.CustomerID);
        if (!exists)
        {
           finalCustomerList.Add(customer);
        }
     }
    

    Is there a better way of doing this?


For Distinct to work with non standard equality checks, you need to make your class customer implement IEquatable<T>. In the Equals method, simply compare the customer ids and nothing else.
As an alternative, you can use the overload of Distinct that requires an IEqualityComparer<T> and create a class that implements that interface for customer. Like that, you don't need to change the customer class.
Or you can use Morelinq as suggested by another answer.


For a simple solution, check out Morelinq by Jon Skeet and others.

It has a DistinctBy operator where you can perform a distinct operation by any field. So you could do something like:

var finalCustomerList = csvCustomerList.DistinctBy(c => c.customerID).ToList(); 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜