Longest Consecutive Sequence in an Unsorted Array [duplicate]
You are given an Array of numbers and they开发者_运维技巧 are unsorted/random order. You are supposed to find the longest sequence of consecutive numbers in the array. Note the sequence need not be in sorted order within the array. Here is an example :
Input :
A[] = {10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101}
Output is :
{16,17,18,19,20,21,22}
The solution needs to be of O(n) complexity.
I am told that the solution involves using a hash table and I did come across few implementations that used 2 hash tables. One cannot sort and solve this because sorting would take O(nlgn) which is not what is desired.
You could have two tables:
- Start table: (start-point, length)
- End table: (ending-point, length)
When adding a new item, you would check:
- Does value + 1 exist in start table? If so, delete it and create a new entry of (value, length + 1) where length is the "current" length. You'd also update the end table with the same end point but greater length.
- Does value - 1 exist in the end table? If so, delete it and create a new entry of (value, length + 1), and this time update the start table (the starting position will be the same, but the length will be increased)
If both conditions hold, then you're effectively stitching two existing sequences together - replace the four existing entries with two new entries, representing the single longer sequence.
If neither condition holds, you just create a new entry of length 1 in both tables.
After all the values have been added, you can just iterate over the start table to find the key with the largest value.
I think this would work, and would be O(n) if we assume O(1) hash lookup/add/delete.
EDIT: C# implementation. It took a little while to get right, but I think it works :)
using System;
using System.Collections.Generic;
class Test
{
static void Main(string[] args)
{
int[] input = {10,21,45,22,7,2,67,19,13,45,12,
11,18,16,17,100,201,20,101};
Dictionary<int, int> starts = new Dictionary<int, int>();
Dictionary<int, int> ends = new Dictionary<int, int>();
foreach (var value in input)
{
int startLength;
int endLength;
bool extendsStart = starts.TryGetValue(value + 1,
out startLength);
bool extendsEnd = ends.TryGetValue(value - 1,
out endLength);
// Stitch together two sequences
if (extendsStart && extendsEnd)
{
ends.Remove(value + 1);
starts.Remove(value - 1);
int start = value - endLength;
int newLength = startLength + endLength + 1;
starts[start] = newLength;
ends[start + newLength - 1] = newLength;
}
// Value just comes before an existing sequence
else if (extendsStart)
{
int newLength = startLength + 1;
starts[value] = newLength;
ends[value + newLength - 1] = newLength;
starts.Remove(value + 1);
}
else if (extendsEnd)
{
int newLength = endLength + 1;
starts[value - newLength + 1] = newLength;
ends[value] = newLength;
ends.Remove(value - 1);
}
else
{
starts[value] = 1;
ends[value] = 1;
}
}
// Just for diagnostics - could actually pick the longest
// in O(n)
foreach (var sequence in starts)
{
Console.WriteLine("Start: {0}; Length: {1}",
sequence.Key, sequence.Value);
}
}
}
EDIT: Here's the single-hashset answer implemented in C# too - I agree, it's simpler than the above, but I'm leaving my original answer for posterity:
using System;
using System.Collections.Generic;
using System.Linq;
class Test
{
static void Main(string[] args)
{
int[] input = {10,21,45,22,7,2,67,19,13,45,12,
11,18,16,17,100,201,20,101};
HashSet<int> values = new HashSet<int>(input);
int bestLength = 0;
int bestStart = 0;
// Can't use foreach as we're modifying it in-place
while (values.Count > 0)
{
int value = values.First();
values.Remove(value);
int start = value;
while (values.Remove(start - 1))
{
start--;
}
int end = value;
while (values.Remove(end + 1))
{
end++;
}
int length = end - start + 1;
if (length > bestLength)
{
bestLength = length;
bestStart = start;
}
}
Console.WriteLine("Best sequence starts at {0}; length {1}",
bestStart, bestLength);
}
}
Dump everything to a hash set.
Now go through the hashset. For each element, look up the set for all values neighboring the current value. Keep track of the largest sequence you can find, while removing the elements found from the set. Save the count for comparison.
Repeat this until the hashset is empty.
Assuming lookup, insertion and deletion are O(1) time, this algorithm would be O(N) time.
Pseudo code:
int start, end, max
int temp_start, temp_end, count
hashset numbers
for element in array:
numbers.add(element)
while !numbers.empty():
number = numbers[0]
count = 1
temp_start, temp_end = number
while numbers.contains(number - 1):
temp_start = number - 1; count++
numbers.remove(number - 1)
while numbers.contains(number + 1):
temp_end = number + 1; count++
numbers.remove(number + 1)
if max < count:
max = count
start = temp_start; end = temp_end
max_range = range(start, end)
The nested whiles don't look pretty, but each number should be used only once so should be O(N).
Here is a solution in Python that uses just a single hash set and doesn't do any fancy interval merging.
def destruct_directed_run(num_set, start, direction):
while start in num_set:
num_set.remove(start)
start += direction
return start
def destruct_single_run(num_set):
arbitrary_member = iter(num_set).next()
bottom = destruct_directed_run(num_set, arbitrary_member, -1)
top = destruct_directed_run(num_set, arbitrary_member + 1, 1)
return range(bottom + 1, top)
def max_run(data_set):
nums = set(data_set)
best_run = []
while nums:
cur_run = destruct_single_run(nums)
if len(cur_run) > len(best_run):
best_run = cur_run
return best_run
def test_max_run(data_set, expected):
actual = max_run(data_set)
print data_set, actual, expected, 'Pass' if expected == actual else 'Fail'
print test_max_run([10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101], range(16, 23))
print test_max_run([1,2,3], range(1, 4))
print max_run([1,3,5]), 'any singleton output fine'
Another solution is with hash search which runs in O(n)
int maxCount = 0;
for (i = 0; i<N; i++)
{
// Search whether a[i] - 1 is present in the list.If it is present,
// you don't need to initiate count since it will be counted when
// (a[i] - 1) is traversed.
if (hash_search(a[i]-1))
continue;
// Now keep checking if a[i]++ is present in the list, increment the count
num = a[i];
while (hash_search(++num))
count++;
// Now check if this count is greater than the max_count got previously
// and update if it is
if (count > maxCount)
{
maxIndex = i;
count = maxCount;
}
}
Here is the implementation:
static int[] F(int[] A)
{
Dictionary<int, int> low = new Dictionary<int, int>();
Dictionary<int, int> high = new Dictionary<int, int>();
foreach (int a in A)
{
int lowLength, highLength;
bool lowIn = low.TryGetValue(a + 1, out lowLength);
bool highIn = high.TryGetValue(a - 1, out highLength);
if (lowIn)
{
if (highIn)
{
low.Remove(a + 1);
high.Remove(a - 1);
low[a - highLength] = high[a + lowLength] = lowLength + highLength + 1;
}
else
{
low.Remove(a + 1);
low[a] = high[a + lowLength] = lowLength + 1;
}
}
else
{
if (highIn)
{
high.Remove(a - 1);
high[a] = low[a - highLength] = highLength + 1;
}
else
{
high[a] = low[a] = 1;
}
}
}
int maxLow = 0, maxLength = 0;
foreach (var pair in low)
{
if (pair.Value > maxLength)
{
maxLength = pair.Value;
maxLow = pair.Key;
}
}
int[] ret = new int[maxLength];
for (int i = 0; i < maxLength; i++)
{
ret[i] = maxLow + i;
}
return ret;
}
class Solution {
public:
struct Node{
int lower;
int higher;
Node(int l, int h):lower(l),higher(h){
}
};
int longestConsecutive(vector<int> &num) {
// Start typing your C/C++ solution below
// DO NOT write int main() function
map<int,Node> interval_map;
map<int,Node>::iterator curr_iter,inc_iter,des_iter;
//first collect
int curr = 0;
int max = -1;
for(size_t i = 0; i < num.size(); i++){
curr = num[i];
curr_iter = interval_map.find(curr);
if (curr_iter == interval_map.end()){
interval_map.insert(make_pair(curr,Node(curr,curr)));
}
}
//the next collect
for(curr_iter = interval_map.begin(); curr_iter != interval_map.end(); curr_iter++)
{
int lower = curr_iter->second.lower;
int higher = curr_iter->second.higher;
int newlower = lower, newhigher = higher;
des_iter = interval_map.find(lower - 1);
if (des_iter != interval_map.end())
{
curr_iter->second.lower = des_iter->second.lower;
newlower = des_iter->second.lower;
}
inc_iter = interval_map.find(higher + 1);
if (inc_iter != interval_map.end()){
curr_iter->second.higher = inc_iter->second.higher;
newhigher = inc_iter->second.higher;
}
if (des_iter != interval_map.end()){
des_iter->second.higher = newhigher;
}
if (inc_iter != interval_map.end()){
inc_iter->second.lower = newlower;
}
if (curr_iter->second.higher - curr_iter->second.lower + 1> max){
max = curr_iter->second.higher - curr_iter->second.lower + 1;
}
}
return max;
}
};
This is the solution of Grigor Gevorgyan from a duplicate of this question, but I think simplified:
data = [1,3,5,7,4,6,10,3]
# other_sides[x] == other end of interval starting at x
# unknown values for any point not the end of an interval
other_sides = {}
# set eliminates duplicates, and is assumed to be an O(n) operation
for element in set(data):
# my intervals left hand side will be the left hand side
# of an interval ending just before this element
try:
left = other_sides[element - 1]
except KeyError:
left = element
# my intervals right hand side will be the right hand side
# of the interval starting just after me
try:
right = other_sides[element + 1]
except KeyError:
right = element
# satisfy the invariants
other_sides[left] = right
other_sides[right] = left
# convert the dictionary to start, stop segments
# each segment is recorded twice, so only keep half of them
segments = [(start, stop) for start, stop in other_sides.items() if start <= stop]
# find the longest one
print max(segments, key = lambda segment: segment[1] - segment[0])
Here's python code based by answer by Grigor Gevorgyan for similar question, I think it's very elegant solution of that problem
l = [10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101]
d = {x:None for x in l}
print d
for (k, v) in d.iteritems():
if v is not None: continue
a, b = d.get(k - 1), d.get(k + 1)
if a is not None and b is not None: d[k], d[a], d[b] = k, b, a
elif a is not None: d[a], d[k] = k, a
elif b is not None: d[b], d[k] = k, b
else: d[k] = k
print d
m = max(d, key=lambda x: d[x] - x)
print m, d[m]
output:
{2: 2, 67: None, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 100, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 10, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 11, 11: 10, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: 45, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 16, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 17, 17: 16, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 18, 17: 16, 18: 16, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 19, 17: 16, 18: 16, 19: 16, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 20, 17: 16, 18: 16, 19: 16, 20: 16, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 21, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 22, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: 16}
16 22
精彩评论