Find shortest subarray containing all elements
Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing 开发者_开发问答all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5]
(python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?
Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[]
and isInteresting[]
are hashes/dictionaries (or plain arrays if the numbers involved are small).
This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.
Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)
This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]
Here's how I solved this problem in linear time using collections.Counter
objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]
Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}
精彩评论