开发者

Find the largest subset of it which form a sequence

I came across this problem during an interview forum.,

Given an int array which might contain duplicates, find the largest subset of it which form a sequence. Eg. {1,6,10,4,7,9,5} then ans is 4,5,6,7 Sorting is an obvious solution. Can this be done in O(n) time.

My take on the problem is that this cannot be done O(n) time & the reason is that if we could do this in O(n) time we could do sorting in O(n) time also ( without knowing the upper bound). As a random array can contain all the elements in sequence but in random order.

Does this sound a plausible explanation ? your thoughts.开发者_StackOverflow社区


I believe it can be solved in O(n) if you assume you have enough memory to allocate an uninitialized array of a size equal to the largest value, and that allocation can be done in constant time. The trick is to use a lazy array, which gives you the ability to create a set of items in linear time with a membership test in constant time.

Phase 1: Go through each item and add it to the lazy array.

Phase 2: Go through each undeleted item, and delete all contiguous items.

In phase 2, you determine the range and remember it if it is the largest so far. Items can be deleted in constant time using a doubly-linked list.

Here is some incredibly kludgy code that demonstrates the idea:

int main(int argc,char **argv)
{
  static const int n = 8;
  int values[n] = {1,6,10,4,7,9,5,5};
  int index[n];
  int lists[n];
  int prev[n];
  int next_existing[n]; // 
  int prev_existing[n];
  int index_size = 0;
  int n_lists = 0;

  // Find largest value
  int max_value = 0;
  for (int i=0; i!=n; ++i) {
    int v=values[i];
    if (v>max_value) max_value=v;
  }

  // Allocate a lazy array
  int *lazy = (int *)malloc((max_value+1)*sizeof(int));

  // Set items in the lazy array and build the lists of indices for
  // items with a particular value.
  for (int i=0; i!=n; ++i) {
    next_existing[i] = i+1;
    prev_existing[i] = i-1;
    int v = values[i];
    int l = lazy[v];
    if (l>=0 && l<index_size && index[l]==v) {
      // already there, add it to the list
      prev[n_lists] = lists[l];
      lists[l] = n_lists++;
    }
    else {
      // not there -- create a new list
      l = index_size;
      lazy[v] = l;
      index[l] = v;
      ++index_size;
      prev[n_lists] = -1;
      lists[l] = n_lists++;
    }
  }
  // Go through each contiguous range of values and delete them, determining
  // what the range is.
  int max_count = 0;
  int max_begin = -1;
  int max_end = -1;
  int i = 0;
  while (i<n) {
    // Start by searching backwards for a value that isn't in the lazy array
    int dir = -1;
    int v_mid = values[i];
    int v = v_mid;
    int begin = -1;
    for (;;) {
      int l = lazy[v];
      if (l<0 || l>=index_size || index[l]!=v) {
        // Value not in the lazy array
        if (dir==1) {
          // Hit the end
          if (v-begin>max_count) {
            max_count = v-begin;
            max_begin = begin;
            max_end = v;
          }
          break;
        }
        // Hit the beginning
        begin = v+1;
        dir = 1;
        v = v_mid+1;
      }
      else {
        // Remove all the items with value v
        int k = lists[l];
        while (k>=0) {
          if (k!=i) {
            next_existing[prev_existing[l]] = next_existing[l];
            prev_existing[next_existing[l]] = prev_existing[l];
          }
          k = prev[k];
        }

        v += dir;
      }
    }
    // Go to the next existing item
    i = next_existing[i];
  }

  // Print the largest range
  for (int i=max_begin; i!=max_end; ++i) {
    if (i!=max_begin) fprintf(stderr,",");
    fprintf(stderr,"%d",i);
  }
  fprintf(stderr,"\n");

  free(lazy);
}


I would say there are ways to do it. The algorithm is the one you already describe, but just use a O(n) sorting algorithm. As such exist for certain inputs (Bucket Sort, Radix Sort) this works (this also goes hand in hand with your argumentation why it should not work).

Vaughn Cato suggested implementation is working like this (its working like a bucket sort with the lazy array working as buckets-on-demand).


As shown by M. Ben-Or in Lower bounds for algebraic computation trees, Proc. 15th ACM Sympos. Theory Comput., pp. 80-86. 1983 cited by J. Erickson in pdf Finding Longest Arithmetic Progressions, this problem cannot be solved in less than O(n log n) time (even if the input is already sorted into order) when using an algebraic decision tree model of computation.

Earlier, I posted the following example in a comment to illustrate that sorting the numbers does not provide an easy answer to the question: Suppose the array is given already sorted into ascending order. For example, let it be (20 30 35 40 47 60 70 80 85 95 100). The longest sequence found in any subsequence of the input is 20,40,60,80,100 rather than 30,35,40 or 60,70,80.

Regarding whether an O(n) algebraic decision tree solution to this problem would provide an O(n) algebraic decision tree sorting method: As others have pointed out, a solution to this subsequence problem for a given multiset does not provide a solution to a sorting problem for that multiset. As an example, consider set {2,4,6,x,y,z}. The subsequence solver will give you the result (2,4,6) whenever x,y,z are large numbers not in arithmetic sequence, and it will tell you nothing about the order of x,y,z.


What about this? populate a hash-table so each value stores the start of the range seen so far for that number, except for the head element that stores the end of the range. O(n) time, O(n) space. A tentative Python implementation (you could do it with one traversal keeping some state variables, but this way seems more clear):

def longest_subset(xs):
    table = {}
    for x in xs:
        start = table.get(x-1, x) 
        end = table.get(x+1, x)
        if x+1 in table:
            table[end] = start
        if x-1 in table:
            table[start] = end
        table[x] = (start if x-1 in table else end)

    start, end = max(table.items(), key=lambda pair: pair[1]-pair[0])
    return list(range(start, end+1))

print(longest_subset([1, 6, 10, 4, 7, 9, 5])) 
# [4, 5, 6, 7]


here is a un-optimized O(n) implementation, maybe you will find it useful:

hash_tb={}
A=[1,6,10,4,7,9,5]

for i in range(0,len(A)):
    if not hash_tb.has_key(A[i]):
        hash_tb[A[i]]=A[i]
max_sq=[];cur_seq=[]
for i in range(0,max(A)):
    if hash_tb.has_key(i):
        cur_seq.append(i)
    else:
        if len(cur_seq)>len(max_sq):
            max_sq=cur_seq
        cur_seq=[]
print max_sq
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜