Optimizing this C# algorithm (K Difference)
This is the problem I'm solving (it's a sample problem, not a real problem):
Given N numbers , [N<=10^5] we need to count the total pairs of numbers that have a difference of K. [K>0 and K<1e9]
Input Format: 1st line contains N & K (integers). 2nd line contains N numbers of the set. All the N numbers are assured to be distinct. Output Format: One integer saying the no of pairs of numbers that have a diff K.
Sample Input #00:
5 2
1 5 3 4 2
Sample Output #00:
3
Sample Input #01:
10 1
363374326 364147530 61825163 1073065718 1281246024 1399469912 428047635 491595254 879792181 1069262793
Sample Output #01:
0
I already have a solution (and I haven't been able to optimize it as well as I had hoped). Currently my solution gets a score of 12/15 when it is run, and I'm wondering why I can't get 15/15 (my solution to another problem wasn't nearly as efficient, but got all of the points). Apparently, the code is run using "Mono 2.10.1, C# 4".
So can anyone think of a better way to optimize this further? The VS profiler says to avoid calling String.Split and Int32.Parse. The calls to Int32.Parse can't be avoided, although I guess I could optimize tokenizing the array.
My current solution:
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
namespace KDifference
{
class Solution
{
static void Main(string[] args)
{
char[] space = { ' ' };
string[] NK = Console.ReadLine().Split(space);
int N = Int32.Parse(NK[0]), K = Int32.Parse(NK[1]);
int[] nums = Console.ReadLine().Split(space, N).Select(x => Int32.Parse(x)).OrderBy(x => x).ToArray();
int KHits = 0;
for (int i = nums.Length - 1, j, k; i >= 1; i--)
{
for (j = 0; j < i; j++)
{
k = nums[i] - nums[j];
if (k == K)
{
KHits++;
}
else if (k < K)
{
break;
}
开发者_运维知识库 }
}
Console.Write(KHits);
}
}
}
Your algorithm is still O(n^2), even with the sorting and the early-out. And even if you eliminated the O(n^2) bit, the sort is still O(n lg n). You can use an O(n) algorithm to solve this problem. Here's one way to do it:
Suppose the set you have is S1 = { 1, 7, 4, 6, 3 }
and the difference is 2.
Construct the set S2 = { 1 + 2, 7 + 2, 4 + 2, 6 + 2, 3 + 2 } = { 3, 9, 6, 8, 5 }
.
The answer you seek is the cardinality of the intersection of S1 and S2. The intersection is {6, 3}
, which has two elements, so the answer is 2.
You can implement this solution in a single line of code, provided that you have sequence of integers sequence
, and integer difference
:
int result = sequence.Intersect(from item in sequence select item + difference).Count();
The Intersect method will build an efficient hash table for you that is O(n) to determine the intersection.
Try this (note, untested):
- Sort the array
- Start two indexes at 0
- If difference between the numbers at those two positions is equal to K, increase count, and increase one of the two indexes (if numbers aren't duplicated, increase both)
- If difference is larger than K, increase index #1
- If difference is less than K, increase index #2, if that would place it outside the array, you're done
- Otherwise, go back to 3 and keep going
Basically, try to keep the two indexes apart by K value difference.
You should write up a series of unit-tests for your algorithm, and try to come up with edge cases.
This would allow you to do it in a single pass. Using hash sets is beneficial if there are many values to parse/check. You might also want to use a bloom filter in combination with hash sets to reduce lookups.
- Initialize. Let A and B be two empty hash sets. Let c be zero.
- Parse loop. Parse the next value v. If there are no more values the algorithm is done and the result is in c.
- Back check. If v exists in A then increment c and jump back to 2.
- Low match. If v - K > 0 then:
- insert v - K into A
- if v - K exists in B then increment c (and optionally remove v - K from B).
- High match. If v + K < 1e9 then:
- insert v + K into A
- if v + K exists in B then increment c (and optionally remove v + K from B).
- Remember. Insert v into B.
- Jump back to 2.
// php solution for this k difference
function getEqualSumSubstring($l,$s) {
$s = str_replace(' ','',$s);
$l = str_replace(' ','',$l);
for($i=0;$i<strlen($s);$i++)
{
$array1[] = $s[$i];
}
for($i=0;$i<strlen($s);$i++)
{
$array2[] = $s[$i] + $l[1];
}
return count(array_intersect($array1,$array2));
}
echo getEqualSumSubstring("5 2","1 3 5 4 2");
Actually that's trivially to solve with a hashmap:
First put each number into a hashmap: dict((x, x) for x in numbers)
in "pythony" pseudo code ;)
Now you just iterate through every number in the hashmap and check if number + K is in the hashmap. If yes, increase count by one.
The obvious improvement to the naive solution is to ONLY check for the higher (or lower) bound, otherwise you get the double results and have to divide by 2 afterwards - useless.
This is O(N) for creating the hashmap when reading the values in and O(N) when iterating through, i.e. O(N) and about 8loc in python (and it is correct, I just solved it ;-) )
Following Eric's answer, paste the implementation of Interscet method below, it is O(n):
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource current in second)
{
set.Add(current);
}
foreach (TSource current2 in first)
{
if (set.Remove(current2))
{
yield return current2;
}
}
yield break;
}
精彩评论