Find shortest subarray containing all elements

2023-02-20 05:00 问答作者：

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing 开发者_开发问答all numbers with minimal complexity.

The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.

For example:

Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7

Then the shortest subarray is obviously Array[2:5] (python notation).

Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?

Proof of a linear-time solution

I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)

The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.

EDIT: Fixed a glitch in the base case.

Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the ~~smallest~~largest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.

To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.

Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.

In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:

# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
    isInteresting[x[i]] = 1

i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
    If count[a[j]] == 0 and isInteresting[a[j]]:
        nDistinctInteresting++
    count[a[j]]++

    If nDistinctInteresting == m:
        # We are in phase 2: contract the left side as far as possible
        While count[a[i]] > 1 or not isInteresting[a[i]]:
            count[a[i]]--
            i++

        If j - i < minRange:
            (minI, minJ) = (i, j)

count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).

This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.

Say the array has n elements, and set has m elements

Sort the array, noting the reverse index (position in the original array)
// O (n log n) time

for each element in given set
   find it in the array
// O (m log n) time  - log n for binary serch, m times

keep track of the minimum and maximum index for each found element

min - max defines your range

Total time complexity: O ((m+n) log n)

This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):

def small_sub(A, B):
    len_A = len(A)
    len_B = len(B)

    sub_A = []
    sub_size = -1
    dict_b = {}

    for elem in B:
        if elem in dict_b:
            dict_b[elem] += 1
        else:
            dict_b.update({elem: 1})

    for i in range(0, len_A - len_B + 1):
        if A[i] in dict_b:
            temp_size, temp_sub = find_sub(A[i:], dict_b.copy())

            if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
                sub_A = temp_sub
                sub_size = temp_size

    return sub_size, sub_A

def find_sub(A, dict_b):
    index = 0
    for i in A:
        if len(dict_b) == 0:
            break

        if i in dict_b:
            dict_b[i] -= 1

            if dict_b[i] <= 0:
                del(dict_b[i])

        index += 1

    if len(dict_b) > 0:
        return -1, {}
    else:
        return index, A[0:index]

Here's how I solved this problem in linear time using collections.Counter objects

from collections import Counter

def smallest_subsequence(stream, search):
    if not search:
        return []  # the shortest subsequence containing nothing is nothing

    stream_counts = Counter(stream)
    search_counts = Counter(search)

    minimal_subsequence = None

    start = 0
    end = 0
    subsequence_counts = Counter()

    while True:
        # while subsequence_counts doesn't have enough elements to cancel out every
        # element in search_counts, take the next element from search
        while search_counts - subsequence_counts:
            if end == len(stream):  # if we've reached the end of the list, we're done
                return minimal_subsequence
            subsequence_counts[stream[end]] += 1
            end += 1

        # while subsequence_counts has enough elements to cover search_counts, keep
        # removing from the start of the sequence
        while not search_counts - subsequence_counts:
            if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
                minimal_subsequence = stream[start:end]
            subsequence_counts[stream[start]] -= 1
            start += 1

print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]

Java solution

    List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
    Set<String> keywords = Arrays.asList("a","b");

    Subarray result = new Subarray(-1,-1);

    Map<String, Integer> keyWordFreq = new HashMap<>();

    int numKeywords = keywords.size();

    // slide the window to contain the all the keywords**
    // starting with [0,0]
    for (int left = 0, right = 0 ; right < paragraph.size() ; right++){

      // expand right to contain all the keywords
      String currRight = paragraph.get(right);

      if (keywords.contains(currRight)){
        keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
      }

      // loop enters when all the keywords are present in the current window
      // contract left until the all the keywords are still present
      while (keyWordFreq.size() == numKeywords){
        String currLeft = paragraph.get(left);

        if (keywords.contains(currLeft)){

          // remove from the map if its the last available so that loop exists
          if (keyWordFreq.get(currLeft).equals(1)){

            // now check if current sub array is the smallest
            if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
              result = new Subarray(left, right);
            }
            keyWordFreq.remove(currLeft);
          }else {

            // else reduce the frequcency
            keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
          }
        }
        left++;
      }

    }

    return result;
  }

继续阅读：algorithm arrays

Find shortest subarray containing all elements

Proof of a linear-time solution

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Proof of a linear-time solution

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？