开发者

LINQ query to detect duplicate properties in a list of objects

I have a list of objects. These objects are made up of a custom class that basically contains two string fields String1 and String2.

What I need to know is if any of these strings are duplicated in that list. So I want to know if objectA.String1 == objectB.String1, or ObjectA.String2 == ObjectB.String2, or ObjectA.String1 == ObjectB.String", or ObjectA.String2 == ObjectB.String1.

Also, I want to mark each object that contains a duplicate string as having a duplicate string (with a bool HasDuplicate on the object).

So when the duplication detection has run I want to simply foreach over the list like so:

foreach (var item in duplicationList)
    if (item.HasDuplicate)
        Console.WriteLine("Duplicate detected!");

This seemd like a nice problem to sol开发者_JAVA百科ve with LINQ, but I cannot for the life of me figure out a good query. So I've solved it using 'good-old' foreach, but I'm still interested in a LINQ version.


Here's a complete code sample which should work for your case.

class A
{
    public string Foo   { get; set; }
    public string Bar   { get; set; }
    public bool HasDupe { get; set; }
}

var list = new List<A> 
          { 
              new A{ Foo="abc", Bar="xyz"}, 
              new A{ Foo="def", Bar="ghi"}, 
              new A{ Foo="123", Bar="abc"}  
          };

var dupes = list.Where(a => list
          .Except(new List<A>{a})
          .Any(x => x.Foo == a.Foo || x.Bar == a.Bar || x.Foo == a.Bar || x.Bar == a.Foo))
          .ToList();

dupes.ForEach(a => a.HasDupe = true);


This should work:

public class Foo
{
    public string Bar;
    public string Baz;
    public bool HasDuplicates;
}

public static void SetHasDuplicate(IEnumerable<Foo> foos)
{
    var dupes = foos
        .SelectMany(f => new[] { new { Foo = f, Str = f.Bar }, new { Foo = f, Str = f.Baz } })
        .Distinct() // Eliminates double entries where Foo.Bar == Foo.Baz
        .GroupBy(x => x.Str)
        .Where(g => g.Count() > 1)
        .SelectMany(g => g.Select(x => x.Foo))
        .Distinct()
        .ToList();

    dupes.ForEach(d => d.HasDuplicates = true);    
}

What you are basically doing is

  1. SelectMany : create a list of all the strings, with their accompanying Foo
  2. Distinct : Remove double entries for the same instance of Foo (Foo.Bar == Foo.Baz)
  3. GroupBy : Group by string
  4. Where : Filter the groups with more than one item in them. These contain the duplicates.
  5. SelectMany : Get the foos back from the groups.
  6. Distinct : Remove double occurrences of foo from the list.
  7. ForEach : Set the HasDuplicates property.

Some advantages of this solution over Winston Smith's solution are:

  1. Easier to extend to more string properties. Suppose there were 5 properties. In his solution, you would have to write 125 comparisons to check for duplicates (in the Any clause). In this solution, it's just a matter of adding the property in the first selectmany call.
  2. Performance should be much better for large lists. Winston's solution iterates over the list for each item in the list, while this solution only iterates over it once. (Winston's solution is O(n²) while this one is O(n)).


First, if your object doesn't have the HasDuplicate property yet, declare an extension method that implements HasDuplicateProperties:

public static bool HasDuplicateProperties<T>(this T instance)
    where T : SomeClass 
    // where is optional, but might be useful when you want to enforce
    // a base class/interface
{
    // use reflection or something else to determine wether this instance
    // has duplicate properties
    return false;
}

You can use that extension method in queries:

var itemsWithDuplicates = from item in duplicationList
                          where item.HasDuplicateProperties()
                          select item;

Same works with the normal property:

var itemsWithDuplicates = from item in duplicationList
                          where item.HasDuplicate
                          select item;

or

var itemsWithDuplicates = duplicationList.Where(x => x.HasDuplicateProperties());


Hat tip to https://stackoverflow.com/a/807816/492

var duplicates = duplicationList
                .GroupBy(l => l)
                .Where(g => g.Count() > 1)
                .Select(g => {foreach (var x in g)
                                 {x.HasDuplicate = true;}
                             return g;
                });

duplicates is a throwaway but it gets you there in less enumerations.


var dups = duplicationList.GroupBy(x => x).Where(y => y.Count() > 1).Select(y => y.Key);

foreach (var d in dups)
    Console.WriteLine(d);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜