First occurrence in a binary search

2023-03-20 03:41 问答作者：

I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?

//ripped from the JDK
public static int binarySearchValu开发者_JAVA技巧e(InvertedContainer.InvertedIndex[] a, long key) {
    return bSearchVal(a, 0, a.length, key);
}

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                                 int toIndex, long key) {
    int low = fromIndex;
    int high = toIndex - 1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        long midVal = a[mid].val;

        if (midVal < key)
            low = mid + 1;
        else if (midVal > key)
            high = mid - 1;
        else
            return mid; // key found
    }
    return (low); // key not found. return insertion point
}

An addition to Jon Skeets post:

The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else if (low != mid) //Equal but range is not fully scanned
        high = mid; //Set upper bound to current number and rescan
    else //Equal and full range is scanned
        return mid;

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

If your data is all integral, then this hack can help. It uses a float array to store values.

float array[];    //contains all integral values
int searchValue;

int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);

Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.

Also this runs is log(n) time.

Example:

import java.util.Arrays;

public class BinarySearch {
    // considering array elements are integers
    float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };

    public void returnFirstOccurrence(int key) {
        int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
        if (ar[firstIndex] != key)
            System.out.println("Key doesn't exist");
        else
            System.out.println("First index of key is " + firstIndex);
    }

    public static void main(String Args[]) throws Exception {
        new BinarySearch().returnFirstOccurrence(9);
    }

}

OUTPUT: 7

p.s: I've used this trick on several coding contests and it nicely worked everytime.

You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).

You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):

public static int binarySearch(int[] a, int key) {
    int low=0,high=a.length-1;
    while (low<=high) {
        int mid=(low+high) >>> 1;
        int midVal=a[mid];
        if (midVal < key) 
            low=mid+1;
        else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
            high=mid-1;
        else if (midVal==key) //found the 1st key 
             return mid;
        else
            return ~mid;      //found insertion point
    }
    return ~(a.length);       //insertion point after everything
}

It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.

The following algorithm binary-searches for the first item with a key greater-than-or-equal-to your search key...

while (upperbound > lowerbound)
{
  testpos = lowerbound + ((upperbound-lowerbound) / 2);

  if (item[testpos] >= goal)
  {
    //  new best-so-far
    upperbound = testpos;
  }
  else
  {
    lowerbound = testpos + 1;
  }
}

This isn't written for Java, which I don't know that well, so it may need a minor tweak. Note that the bounds are half-open (lowerbound is inclusive and upperbound is exclusive) and that this is important for correctness.

This can be adapted to other similar searches - e.g. finding the last key <= the search value.

This is slightly modified from my earlier question-and-answer here.

here is the solution, i found for getting the lower index of key having multiple occurrences in a sorted array using binary search.

int lowerBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]< key) low=mid;
        else high=mid;
    }
    int p = high;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}

we have to change a little in this code to work for upper bound, using binary search, so here is the working copy of code.

 int upperBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]> key) high=mid;
        else low=mid;
    }
    int p = low;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}

One approach is to hold an invariant throughout the whole binary search. In your particular case, the invariant would be:

array[low] < key
key <= array[high]

Then you can minimize the gap between low and high using binary search. When low + 1 == high, high would be the answer. Example code in C++:

// check invariant on initial values.
if (array[low] >= key) return low;
if (array[high] < key) return high+1;
// low + 1 < high ensures high is at least low + 2, thus
// mid will always be different from low or high. It will
// stop when low + 1 == high.
while (low + 1 < high) {
  int mid = low + (high - low) / 2;
  if (array[mid] < key) {
    low = mid;   // invariant: array[low] < key
  } else {
    high = mid;  // invariant: array[high] >= key
  }
}
return high;

Key difference between this and your example code is to update low and high to only mid rather than mid+1 or mid-1, because we have checked the value of array[mid], we can guarantee the invariant still holds when updating the boundaries. You need to check the invariant on initial values before starting to search too.

In this thread, you can find a full example of the binary search (recursive version), and two other versions (based on the original one) which allow you to get the first index and last index of a given key.

For your convenience I added the relevant Junit tests.

Here's a variation of the solution in scala. Used pattern-matching and recursion instead of the while loop to get the first occurrence.

def binarySearch(arr:Array[Int],key:Int):Int = {
     def binSearchHelper(lo:Int,hi:Int,mid:Int):Int = {
        if(lo > hi) -1 else {
            if(arr(mid) == key) mid else if(arr(mid) > key){
                binSearchHelper(lo,mid-1,lo + (((mid-1) - lo)/2))
            }else{
                binSearchHelper(mid+1,hi,(mid+1) + ((hi - (mid+1))/2))
            }
        }
     }
    binSearchHelper(0,arr.size-1,(arr.size-1)/2)
}

def findFirstOccurrence(arr:Array[Int],key:Int):Int = {
    val startIdx = binarySearch(arr,key)
    startIdx match {
        case 0 => 0
        case -1 => -1
        case _ if startIdx > 0 => {
            if(arr(startIdx - 1) < key) startIdx else {
                    findFirstOccurrence(arr.slice(0,startIdx),key)
            }
        }
    }
}

This should do the trick

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                             int toIndex, long key) {
int low = fromIndex;
int high = toIndex - 1;
int result = low;
while (low <= high) {
    int mid = (low + high) >>> 1;
    long midVal = a[mid].val;

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else
    {
        result = mid;
        high = mid -1; 
    }
}
return result;

}

For the last occurrence of the element:

static int elementExists(int input[], int element){
    int lo=0;
    int high = input.length-1;
    while(lo<high){
        int mid = (lo + high )/2;
        if(element >input[mid] ){
            lo = mid+1;
        }
        else if(element < input[mid]){
            high= mid-1;
        }
        else if (high != input.length-1) //Change for the Occurrence check
            lo = mid;
        else {
            return mid;
        }
    }
    return -1;
}

For the first occurrence:

else if (lo != mid){
        high = mid;
}

I think a simpler approach is storing the latest mid index where xs[mid] == key into a result variable and then keep running the binary search.

Here's the swift code:

func first<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { hi = mid - 1; res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}

Also, this requires a really small change (just one line) if you were to find the last index of a key.

func last<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { lo = mid + 1;  res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}

Try this javascript recursive solution. It's optimal in a sense that it's O(log(N))

function solve(A, e) {
  function solve (A, start, end, e, bestUntilNow) {
    if (start > end) {
      if (A[start] === e)
        return start
      return bestUntilNow
    } else {
      const mid = start + Math.floor((end - start) / 2)
      if (A[mid] === e) {
        return solve(A, start, mid - 1, e, mid)
      } else if (e < A[mid]) {
        return solve(A, start, mid - 1, e, bestUntilNow)
      } else {
        return solve(A, mid + 1, end, e, bestUntilNow)
      }
    }
  }
  return solve(A, 0, A.length, e, -1)
}

Given a sorted array with possibly duplicated elements like [1,1,2,2,2,2,3], the following code with time complexity O(logn) finds the indexes of the first and last occurrences of an element in the given array. This approach is based on a recursive JS Binary Search implementation by comparing with the immediately lower or immediately higher index/element after matching initially the element (as it is also suggested below).

// Find the first occurence of the value using binary search
function binarySearchFirstOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle-1] < value || middle === 0)) {
    return middle;
  } else if (arr[middle] < value) {
    return binarySearchFirstOccurence(arr, middle + 1, right, value);
  } else {
    // Going lower
    return binarySearchFirstOccurence(arr, left, middle - 1, value);
  }
}

// Find the last occurence of the value using binary search
function binarySearchLastOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle+1] > value || middle === arr.length - 1)) {
    return middle;
  } else if (arr[middle] > value) {
    return binarySearchLastOccurence(arr, left, middle - 1, value);
  } else {
    // Going higher
    return binarySearchLastOccurence(arr, middle + 1, right, value);
  }
}

function sortedFrequency(arr, value) {
  let left = 0;
  let right = arr.length -1;
  let first = binarySearchFirstOccurence(arr, left, right, value);
  let last = binarySearchLastOccurence(arr, left, right, value);
  if (first === -1 && last === -1) {
    return -1;
  } else if (first === -1 && last !== -1) {
    return 1;
  } else {
    return last-first+1;
  }
}

let arr = [1,1,2,2,2,2,3];
console.log(sortedFrequency(arr, 3));

Best George

The solution is doing some changes in the binary search so that time complexity remains equal to O(log2n). Keep in mind that for binary search the array must be in sorted form. Following are the steps

Calculate the mid index
If the element to be searched is greater than element present at the mid index then low becomes mid + 1
If the element to be searched is smaller than element present at the mid index then high will become mid - 1.
Now lets make some changes. Suppose if two elements are same in an array so our aim is to return the first occurrence of that element. Case would be if mid index == 0 or arr[mid - 1] != arr[mid], then mid index is returned. This means that same element is not repeated and if arr[mid - 1] == arr[mid], then one will make high = mid - 1. The reason being one need to return the previous element as it is the first occurrence.

Code

def first_occurence_binary_search(arr, search_element):

n = len(arr)
low = 0
high = n - 1

while low <= high:

    mid = (low + high) // 2
    if arr[mid] > search_element:
        high = mid - 1

    elif arr[mid] < search_element:
        low = mid + 1

    else:
        if mid == 0 or arr[mid - 1] != arr[mid]:
            return mid
        else:
            high = mid - 1

return -1


array = [1, 3, 3, 3]
searching = 3
print(first_occurence_binary_search(arr=array,
                                    search_element=searching))

继续阅读：algorithm binary binary-search search

First occurrence in a binary search

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？