开发者

LINQ: select elements that only appear once in a list

I have a list of objects, can be of any type T.

How to select a list of objects that appear in that list only once using linq? For开发者_如何学Python example, if my list is {2,3,4,5,8,2,3,5,4,2,3,4,6}, then the output should be {6,8}.


You could try this:

int[] arr = { 2, 3, 4, 5, 8, 2, 3, 5, 4, 2, 3, 4, 6 };
var q =
    from g in arr.GroupBy(x => x)
    where g.Count() == 1
    select g.First();


Use the Count() function.

    int[] a = {2,3,4,5,8,2,3,5,4,2,3,4,6};

    var selection = from i in a
        where (a.Count(n => n == i) == 1)
        select i;


Note: this is not a new answer, only an elaboration on the other answers.

While the OP explicitly asks for an answer using Linq, I think it is worth mentioning that sometimes there are disadvantages to using Linq. It allows for succint and (mostly) very readable code, but it does not always generate the most efficient underlying code (e.g. in the other answers given, the Count method enumerates the complete array everytime it is called).

So sometimes classical procedural code is a better choice.

To illustrate this, I wrote two alternative implementations: one using a dictionary, another using two hashsets. Both methods enumerate the array only once.

The benchmark results:

Method array Mean(ns) Relative Error StdDev Gen0 Gen1 Allocated Relative
GetUniquesByLinq Int32[10000] 330,502.4ns 100% 3,771.44ns 3,527.81ns 62.5000 18.5547 294616 B 100%
GetUniquesByDictionary Int32[10000] 161,602.2ns 49% 873.37ns 774.22ns 15.3809 2.4414 73336 B 25%
GetUniquesByHashSet Int32[10000] 120,871.6ns 37% 412.96ns 366.07ns 15.1367 2.0752 71616 B 24%
GetUniquesByLinq Int32[1000] 63,855.5ns 100% 813.66ns 679.45ns 18.6768 3.6621 88104 B 100%
GetUniquesByDictionary Int32[1000] 27,243.4ns 42% 184.51ns 172.59ns 8.1787 0.0916 38600 B 44%
GetUniquesByHashSet Int32[1000] 22,269.1ns 35% 232.72ns 217.68ns 5.8289 0.2747 27440 B 31%
GetUniquesByLinq Int32[13] 636.8ns 100% 5.57ns 4.94ns 0.2584 - 1216 B 100%
GetUniquesByDictionary Int32[13] 368.9ns 58% 2.79ns 2.61ns 0.1326 - 624 B 51%
GetUniquesByHashSet Int32[13] 319.4ns 50% 6.34ns 6.78ns 0.1493 - 704 B 58%

From these results it is obvious that the 'classical' methods are more performant in terms of execution time and memory allocation (strain on the GC).

The code used to generate these benchmarks:

[MemoryDiagnoser]
public class UniqueSelector
{
    public IEnumerable<int[]> Data()
    {
        var rnd = new Random(1);
        yield return new int[] { 2, 3, 4, 5, 8, 2, 3, 5, 4, 2, 3, 4, 6 };
        yield return Enumerable.Range(0, 1000).Select(i => rnd.Next(1000)).ToArray();
        yield return Enumerable.Range(0, 10000).Select(i => rnd.Next(1000)).ToArray();
    }
         
    [Benchmark]
    [ArgumentsSource(nameof(Data))]
    public int[] GetUniquesByLinq(int[] array)
    {
        var q =
        from g in array.GroupBy(x => x)
        where g.Count() == 1
        select g.First();
        return q.ToArray();
    }

    [Benchmark]
    [ArgumentsSource(nameof(Data))]
    public int[] GetUniquesByDictionary(int[] array)
    {
        var counts = new Dictionary<int, int>();
        foreach (int item in array)
        {
            if (!counts.TryAdd(item, 1)) counts[item]++;
        }
        return counts.Where(kv => kv.Value == 1).Select(kv => kv.Key).ToArray();
    }

    [Benchmark]
    [ArgumentsSource(nameof(Data))]
    public int[] GetUniquesByHashSet(int[] array)
    {
        var uniques = new HashSet<int>();
        var duplicates = new HashSet<int>();
        foreach (int item in array)
        {
            if (duplicates.Contains(item)) continue;
            if (uniques.Contains(item))
            {
                duplicates.Add(item);
                uniques.Remove(item);
                continue;
            }
            uniques.Add(item);
        }
        return uniques.ToArray();
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜