LINQ: select elements that only appear once in a list
I have a list of objects, can be of any type T
.
How to select a list of objects that appear in that list only once using linq? For开发者_如何学Python example, if my list is
{2,3,4,5,8,2,3,5,4,2,3,4,6}
, then the output should be {6,8}
.
You could try this:
int[] arr = { 2, 3, 4, 5, 8, 2, 3, 5, 4, 2, 3, 4, 6 };
var q =
from g in arr.GroupBy(x => x)
where g.Count() == 1
select g.First();
Use the Count() function.
int[] a = {2,3,4,5,8,2,3,5,4,2,3,4,6};
var selection = from i in a
where (a.Count(n => n == i) == 1)
select i;
Note: this is not a new answer, only an elaboration on the other answers.
While the OP explicitly asks for an answer using Linq, I think it is worth mentioning that sometimes there are disadvantages to using Linq. It allows for succint and (mostly) very readable code, but it does not always generate the most efficient underlying code (e.g. in the other answers given, the Count method enumerates the complete array everytime it is called).
So sometimes classical procedural code is a better choice.
To illustrate this, I wrote two alternative implementations: one using a dictionary, another using two hashsets. Both methods enumerate the array only once.
The benchmark results:
Method | array | Mean(ns) | Relative | Error | StdDev | Gen0 | Gen1 | Allocated | Relative |
---|---|---|---|---|---|---|---|---|---|
GetUniquesByLinq | Int32[10000] | 330,502.4ns | 100% | 3,771.44ns | 3,527.81ns | 62.5000 | 18.5547 | 294616 B | 100% |
GetUniquesByDictionary | Int32[10000] | 161,602.2ns | 49% | 873.37ns | 774.22ns | 15.3809 | 2.4414 | 73336 B | 25% |
GetUniquesByHashSet | Int32[10000] | 120,871.6ns | 37% | 412.96ns | 366.07ns | 15.1367 | 2.0752 | 71616 B | 24% |
GetUniquesByLinq | Int32[1000] | 63,855.5ns | 100% | 813.66ns | 679.45ns | 18.6768 | 3.6621 | 88104 B | 100% |
GetUniquesByDictionary | Int32[1000] | 27,243.4ns | 42% | 184.51ns | 172.59ns | 8.1787 | 0.0916 | 38600 B | 44% |
GetUniquesByHashSet | Int32[1000] | 22,269.1ns | 35% | 232.72ns | 217.68ns | 5.8289 | 0.2747 | 27440 B | 31% |
GetUniquesByLinq | Int32[13] | 636.8ns | 100% | 5.57ns | 4.94ns | 0.2584 | - | 1216 B | 100% |
GetUniquesByDictionary | Int32[13] | 368.9ns | 58% | 2.79ns | 2.61ns | 0.1326 | - | 624 B | 51% |
GetUniquesByHashSet | Int32[13] | 319.4ns | 50% | 6.34ns | 6.78ns | 0.1493 | - | 704 B | 58% |
From these results it is obvious that the 'classical' methods are more performant in terms of execution time and memory allocation (strain on the GC).
The code used to generate these benchmarks:
[MemoryDiagnoser]
public class UniqueSelector
{
public IEnumerable<int[]> Data()
{
var rnd = new Random(1);
yield return new int[] { 2, 3, 4, 5, 8, 2, 3, 5, 4, 2, 3, 4, 6 };
yield return Enumerable.Range(0, 1000).Select(i => rnd.Next(1000)).ToArray();
yield return Enumerable.Range(0, 10000).Select(i => rnd.Next(1000)).ToArray();
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public int[] GetUniquesByLinq(int[] array)
{
var q =
from g in array.GroupBy(x => x)
where g.Count() == 1
select g.First();
return q.ToArray();
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public int[] GetUniquesByDictionary(int[] array)
{
var counts = new Dictionary<int, int>();
foreach (int item in array)
{
if (!counts.TryAdd(item, 1)) counts[item]++;
}
return counts.Where(kv => kv.Value == 1).Select(kv => kv.Key).ToArray();
}
[Benchmark]
[ArgumentsSource(nameof(Data))]
public int[] GetUniquesByHashSet(int[] array)
{
var uniques = new HashSet<int>();
var duplicates = new HashSet<int>();
foreach (int item in array)
{
if (duplicates.Contains(item)) continue;
if (uniques.Contains(item))
{
duplicates.Add(item);
uniques.Remove(item);
continue;
}
uniques.Add(item);
}
return uniques.ToArray();
}
}
精彩评论