How can I quickly tell if a list contains only duplicates?
There are multiple related questions, but I'm looking for a solution specific to my case. There is an array of (usually) 14 integers. How can I quickly tell if each int appears exactly twice (i.e. there are 7 pairs)? The value range is from 1 to 35. The main aspect here is performance.
For reference, this is my current solution. It was written to resemble the spec as closely as possible and without per开发者_运维百科formance in mind, so I'm certain is can be improved vastly:
var pairs = Array
.GroupBy (x => x)
.Where (x => x.Count () == 2)
.Select (x => x.ToList ())
.ToList ();
IsSevenPairs = pairs.Count == 7;
Using Linq is optional. I don't care how, as long as it's fast :)
Edit: There is the special case that an int appears 2n times with n > 1. In this case the check should fail, i.e. there should be 7 distinct pairs.
Edit: Result I tested Ani's and Jon's solutions with tiny modifications and found during multiple benchmark runs in the target app that Ani's has about twice Jon's throughput on my machine (some Core 2 Duo on Win7-64). Generating the array of ints already takes about as long as the respective checks, so I'm happy with the result. Thanks, all!
Well, given your exact requirements, we can be a bit smarter. Something like this:
public bool CheckForPairs(int[] array)
{
// Early out for odd arrays.
// Using "& 1" is microscopically faster than "% 2" :)
if ((array.Length & 1) == 1)
{
return false;
}
int[] counts = new int[32];
int singleCounts = 0;
foreach (int item in array)
{
int incrementedCount = ++counts[item];
// TODO: Benchmark to see if a switch is actually the best approach here
switch (incrementedCount)
{
case 1:
singleCounts++;
break;
case 2:
singleCounts--;
break;
case 3:
return false;
default:
throw new InvalidOperationException("Shouldn't happen");
}
}
return singleCounts == 0;
}
Basically this keeps track of how many unpaired values you still have, and has an "early out" if it ever finds three of a kind.
(I don't know offhand whether this will be faster or slower than Ani's approach of incrementing and then checking for unmatched pairs afterwards.)
Clearly, LINQ won't provide the optimal solution here, although I would improve your current LINQ solution to:
// checks if sequence consists of items repeated exactly once
bool isSingleDupSeq = mySeq.GroupBy(num => num)
.All(group => group.Count() == 2);
// checks if every item comes with atleast 1 duplicate
bool isDupSeq = mySeq.GroupBy(num => num)
.All(group => group.Count() != 1);
For the specific case you mention (0 - 31), here's a faster, array-based solution. It doesn't scale very well when the range of possible numbers is large (use a hashing solution in this case).
// elements inited to zero because default(int) == 0
var timesSeenByNum = new int[32];
foreach (int num in myArray)
{
if (++timesSeenByNum[num] == 3)
{
//quick-reject: number is seen thrice
return false;
}
}
foreach (int timesSeen in timesSeenByNum)
{
if (timesSeen == 1)
{
// only rejection case not caught so far is
// if a number is seen exactly once
return false;
}
}
// all good, a number is seen exactly twice or never
return true;
EDIT: Fixed bugs as pointed out by Jon Skeet. I should also point out that his algo is smarter and probably faster.
I would create an array of 32 integer elements, initialized to zero. Let's call it "billy".
For each element of the input array, I'd increment billy[element] of 1.
At the end, check if billy contains only 0 or 2.
Almost certainly overkill when you've only got 14-ish pairs and only 32-ish possible values, but in the general case you could do something like this:
bool onlyPairs = yourArray.ContainsOnlyPairs();
// ...
public static class EnumerableExtensions
{
public static bool ContainsOnlyPairs<T>(this IEnumerable<T> source)
{
var dict = new Dictionary<T, int>();
foreach (T item in source)
{
int count;
dict.TryGetValue(item, out count);
if (count > 1)
return false;
dict[item] = count + 1;
}
return dict.All(kvp => kvp.Value == 2);
}
}
If the range of items is 0-31, you can store 32 one-bit flags in a uint32. I would suggest taking each item and compute mask=(1 SHL item), and see what happens if you try 'or'ing, 'xor'ing, or adding the mask values. Look at the results for valid and invalid cases. To avoid overflow, you may want to use a uint64 for the addition (since a uint32 could overflow if there are two 31's, or four 30's, or eight 29's).
I guess (never measured the speed) this codesnipet can give you a new point of view:
int[] array = { 0, 1, 2, 3, 1, 1, 3, 5, 1, 2, 7, 31 }; // this is your sample array
uint[] powOf2 = {
1, 2, 4, 8,
16, 32, 64, 128,
256, 512, 1024, 2048,
4096, 8192, 16384, 32768,
65536, 131072, 262144, 524288,
1048576, 2097152, 4194304, 8388608,
16777216, 33554432, 67108864, 134217728,
268435456, 536870912, 1073741824, 2147483648
};
uint now;
uint once = 0;
uint twice = 0;
uint more = 0;
for (int i = 0; i < array.Length; i++)
{
now = powOf2[array[i]];
more |= twice & now;
twice ^= (once & now) & ~more;
twice ^= more;
once |= now;
}
You can have the doubled values in the variable "twice"; Of course it only works for values less than 32;
精彩评论