开发者

How to use IEqualityComparer<T>.Equals() in ToLookUp<T>() Extension

I stumbled upon an article regarding the Birthday Paradox and it's implications when overriding the GetHashCode method, I find myself in a bind.

In tests, we found that in calls to the ToLookup() Extension, only GetHashcode is used, despite providing the implementation for Equals.

I think I understand why this happens, the internal working of ToLookup, HashSet, Dictionary, etc, use the HashCodes to store and/or index their elements?

Is there a way to somehow provide the functionality so that the equality comparison is actual performed using the equals method? Or should I not be concerned with the collisions? I haven't done the maths myself, but according to the first article I linked, you would only need 77,163 elements in a list before reaching a 50% chance of collision.

If I understand this correctly, an Equals() override that compares property by property such as

Return (a.Property1 == b.Property1 && a.Property2 == b.Property2 && ...)

should have a zero chance of collision? So how can I get my ToLookup() to equality compare this way?


In case you need an example of what I mean:

C#

class Program
{

    static void Main(string[] args)
    {
        DoStuff();
        Console.ReadKey();
    }

    public class AnEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
        public string OtherProperty1 { get; set; }
        public List<string开发者_C百科> OtherProperty2 { get; set; }
    }

    public class KeyEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
    }

    public static void DoStuff()
    {
        var a = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "foo"};
        var b = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "bar"};
        var c = new AnEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999, OtherProperty1 = "yada"};

        var entityList = new List<AnEntity> { a, b, c };

        var lookup = entityList.ToLookup(n => new KeyEntity {KeyProperty1 = n.KeyProperty1, KeyProperty2 = n.KeyProperty2, KeyProperty3 = n.KeyProperty3});

        // I want these to all return true
        Debug.Assert(lookup.Count == 2);
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].First().OtherProperty1 == "foo");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].Last().OtherProperty1 == "bar");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999}].Single().OtherProperty1 == "yada");
    }

}

VB

Module Program

    Public Sub Main(args As String())
        DoStuff()
        Console.ReadKey()
    End Sub

    Public Class AnEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
        Public Property OtherProperty1 As String
        Public Property OtherProperty2 As List(Of String) 
    End Class

    Public Class KeyEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
    End Class

    Public Sub DoStuff()
        Dim a = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "foo"}
        Dim b = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "bar"}
        Dim c = New AnEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999, .OtherProperty1 = "yada"}

        Dim entityList = New List(Of AnEntity) From {a, b, c}

        Dim lookup = entityList.ToLookup(Function(n) New KeyEntity With {.KeyProperty1 = n.KeyProperty1, .KeyProperty2 = n.KeyProperty2, .KeyProperty3 = n.KeyProperty3})

        ' I want these to all return true
        Debug.Assert(lookup.Count = 2)
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).First().OtherProperty1 = "foo")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).Last().OtherProperty1 = "bar")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999}).Single().OtherProperty1 = "yada")
    End Sub

End Module

I can get that to work with an override of GetHashcode(), no problems. But I don't want to use GetHashcode because if I have, for example, 109,125 elements in my list, apparently I'm already at 75% chance of collision? If it used aforementioned Equals() override, I think I'd be at 0%?


The article that you've linked to is completely misleading (and many of its comments highlight this).

GetHashCode is used where possible because it's fast; if there are hash collisions then Equals is used to disambiguate between the colliding items. So long as you implement Equals and GetHashCode correctly -- whether in the types themselves or a custom IEqualityComparer<T> implementation -- then there won't be any problems.

The problem with your example code is that you're not overriding Equals and GetHashCode at all. This means that the the default implementations are used, and the default implementations use reference comparisons for reference types, not value comparisons.

This means that you're not getting hash collisions because the objects you're comparing against are different to the original objects, even though they have the same values. This, in turn, means that Equals just isn't required by your example code. Override Equals and GetHashCode correctly, or set up an IEqualityComparer<T> to do so, and everything will start working as you expect.


The birthday paradox does not apply in this situation. The birthday paradox relates to non-deterministic random sets, whereas hashcode computation is determinitic. the chances of 2 objects with different state sharing the same hashcode is much closer to 1 in a billion or so, certainly not as low as 77 thousand - therefore I dont think you have anything to worry about.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜