Finding the minimum and maximm element from one of many arrays

2023-03-08 03:58 问答作者：

I received a question during an Amazon interview and would like assistance with solving it.

Given N arrays of size K each, each of these K elements in the N arrays are sorted, and each of these N*K elements are unique. Choose a single element from each of the开发者_Go百科 N arrays, from the chosen subset of N elements. Subtract the minimum and maximum element. This difference should be the least possible minimum.

Sample:

N=3, K=3

N=1 : 6, 16, 67
N=2 : 11,17,68
N=3 : 10, 15, 100

here if 16, 17, 15 are chosen, we get the minimum difference as 17-15=2.

I can think of O(N*K*N)(edited after correctly pointed out by zivo, not a good solution now :( ) solution.
1. Take N pointer initially pointing to initial element each of N arrays.

6, 16, 67
^ 
11,17,68
^
10, 15, 100
^

2. Find out the highest and lowest element among the current pointer O(k) (6 and 11) and find the difference between them.(5)
3. Increment the pointer which is pointing to lowest element by 1 in that array.

 6, 16, 67
    ^ 
 11,17,68
 ^
 10, 15, 100 (difference:5)
 ^

4. Keep repeating step 2 and 3 and store the minimum difference.

 6, 16, 67
    ^ 
 11,17,68
 ^
 10,15,100 (difference:5)
    ^ 


 6, 16, 67
    ^ 
 11,17,68
    ^
 10,15,100 (difference:2)
    ^

Above will be the required solution.

 6, 16, 67
    ^ 
 11,17,68
    ^
 10,15,100 (difference:84)
       ^ 

 6, 16, 67
        ^ 
 11,17,68
    ^
 10,15,100 (difference:83)
       ^

And so on......

EDIT:

Its complexity can be reduced by using a heap (as suggested by Uri). I thought of it but faced a problem: Each time an element is extracted from heap, its array number has to be found out in order to increment the corresponding pointer for that array. An efficient way to find array number can definitely reduce the complexity to O(K*N log(K*N)). One naive way is to use a data structure like this

Struct
{
    int element;
    int arraynumer;
}

and reconstruct the initial data like

 6|0,16|0,67|0

 11|1,17|1,68|1

 10|2,15|2,100|2

Initially keep the current max for first column and insert the pointed elements in heap. Now each time an element is extracted, its array number can be found out, pointer in that array is incremented , the newly pointed element can be compared to current max and max pointer can be adjusted accordingly.

So here is an algorithm to do solve this problem in two steps:

First step is to merge all your arrays into one sorted array which would look like this:

combined_val[] - which holds all numbers
combined_ind[] - which holds index of which array did this number originally belonged to

this step can be done easily in O(K*N*log(N)) but i think you can do better than that too (maybe not, you can lookup variants of merge sort because they do step similar to that)

Now second step:

it is easier to just put code instead of explaining so here is the pseduocode:


int count[N] = { 0 }
int head = 0;
int diffcnt = 0;
// mindiff is initialized to overall maximum value - overall minimum value
int mindiff = combined_val[N * K - 1] - combined_val[0];
for (int i = 0; i < N * K; i++) 
{
  count[combined_ind[i]]++;

  if (count[combined_ind[i]] == 1) {
    // diffcnt counts how many arrays have at least one element between
    // indexes of "head" and "i". Once diffcnt reaches N it will stay N and
    // not increase anymore
    diffcnt++;
  } else {
    while (count[combined_ind[head]] > 1) {
      // We try to move head index as forward as possible while keeping diffcnt constant.
      // i.e. if count[combined_ind[head]] is 1, then if we would move head forward
      // diffcnt would decrease, that is something we dont want to do.
      count[combined_ind[head]]--;
      head++;
    }
  }

  if (diffcnt == N) {
    // i.e. we got at least one element from all arrays
    if (combined_val[i] - combined_val[head] < mindiff) {
      mindiff = combined_val[i] - combined_val[head];
      // if you want to save actual numbers too, you can save this (i.e. i and head
      // and then extract data from that)
    }
  }
}

the result is in mindiff.

The runing time of second step is O(N * K). This is because "head" index will move only N*K times maximum. so the inner loop does not make this quadratic, it is still linear.

So total algorithm running time is O(N * K * log(N)), however this is because of merging step, if you can come up with better merging step you can probably bring it down to O(N * K).

This problem is for managers

You have 3 developers (N1), 3 testers (N2) and 3 DBAs (N3) Choose the less divergent team that can run a project successfully.

int[n] result;// where result[i] keeps the element from bucket N_i

int[n] latest;//where latest[i] keeps the latest element visited from bucket N_i

Iterate elements in (N_1 + N_2 + N_3) in sorted order
{
    Keep track of latest element visited from each bucket N_i by updating 'latest' array;

    if boundary(latest) < boundary(result)
    {
       result = latest;
    }
}

int boundary(int[] array)
{
   return Max(array) - Min(array);
}

I've O(K*N*log(K)), with typical execution much less. Currently cannot think anything better. I'll explain first the easier to describe (somewhat longer execution):

For each element f in the first array (loop through K elements)
For each array, starting from the second array (loop through N-1 arrays)
Do a binary search on the array, and find element closest to f. This is your element (Log(K))

This algorithm can be optimized, if for each array, you add a new Floor Index. When performent the binary search, search between 'Floor' to 'K-1'. Initially Floor index is 0, and for first element you search through the entire arrays. Once you find an element closest to 'f', update the Floor Index with the index of that element. Worse case is the same (Floor may not update, if maximum element of first array is smaller than any other minimum), but average case will improve.

Correctness proof for the accepted answer (Terminal's solution)

Assume that the algorithm finds a series A=<A[1],A[2],...,A[N]> which isn't the optimal solution (R).

Consider the index j in R, such that item R[j] is the first item among R that the algorithm examines and replaces it with the next item in its row.

Let A' denote the candidate solution at that phase (prior to the replacement). Since R[j]=A'[j] is the minimum value of A', it's also the minimum of R. Now, consider the maximum value of R, R[m]. If A'[m]<R[m], then R can be improved by replacing R[m] with A'[m], which contradicts the fact that R is optimal. Therefore, A'[m]=R[m]. In other words, R and A' share the same maximum and minimum, therefore they are equivalent. This completes the proof: if R is an optimal solution, then the algorithm is guaranteed to find a solution as good as R.

for every element in 1st array

    choose the element in 2nd array that is closest to the element in 1st array
    current_array = 2;
    do
    {
        choose the element in current_array+1 that is closest to the element in current_array
        current_array++;
    } while(current_array < n);

complexity: O(k^2*n)

Here is my logic on how to resolve this issue, keeping in mind that we need to pick one element from each of the N arrays (to compute the least minimum)

// if we take the above values as an example!
// then the idea would be to sort all three arrays while keeping another
// array to keep the reference to their sets (1 or 2 or 3, could be 
// extended to n sets)      
1   3   2   3   1   2   1   2   3    // this is the array that holds the set index
6   10  11  15  16  17  67  68  100  // this is the sorted combined array.
           |           |   
    5            2          33       // this is the computed least minimum,
                                     // the rule is to make sure the indexes of the values 
                                     // we are comparing are different (to make sure we are 
// comparing elements from different sets), then for example
// the first element of that example is index:1|value:6 we hold 
// that value 6 (that is the value we will be using to compute the least minimum, 
// then we go to the edge of the comparison which would be the second different index, 
// we skip index:3|value:10 (we remove it from the array) we compare index:2|value:11 
// to index:1|value:6 we obtain 5 which would go to a variable named leastMinimum = 5, 
// now we remove the indexes and values we already used,
// and redo the same steps.

Step 1:

1   3   2   3   1   2   1   2   3
6   10  11  15  16  17  67  68  100
           |   
5            
leastMinumum = 5

Step 2:

3   1   2   1   2   3
15  16  17  67  68  100
           |   
 2          
leastMinimum = min(2, leastMinumum) // which is equal 2

Step 3:

1   2   3
67  68  100

    33
leastMinimum = min(33, leastMinumum) // which is equal to old leastMinumum which is 2

Now: We suppose we have elements from the same array that are very close to each other (k=2 this time which means we only have 3 sets with two values) :

// After sorting the n arrays we will have the below indexes array and values array
1   1   2   3   2   3
6   7   8   12  15  16
*       *   *

* we skip second index of 1|7 and we take the least minimum of 1|6 and 3|12 (index:2|value:8 will be removed as it is not at the edges, we pick the minimum and maximum of the unique index subset of n elements)
1   3         
6   12
 =6
* second step we remove the values we already used, so the array become like below:

1   2   3
7   15  16
*   *   * 
7 - 16
= 9

Note: Another approach that consumes more memory would consist of creating N sub-arrays from which we would be comparing the maximum - minumum

So from the below sorted values array and its corresponding indexes array we extract three other sub arrays:

1   3   2   3   1   2   1   2   3
6   10  11  15  16  17  67  68  100

First Array:

1   3   2 
6   10  11

11-6 = 5

Second Array:

3   1   2
15  15  17

17-15 = 2

Third Array:

1   2   3
67  68  100

100 - 67 = 33

继续阅读：algorithm language-agnostic

Finding the minimum and maximm element from one of many arrays

EDIT:

Correctness proof for the accepted answer (Terminal's solution)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

EDIT:

Correctness proof for the accepted answer (Terminal's solution)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？