开发者

Any idea how to transform this O(n^2) algo into a O(n)

I have the following algorithm which scan a large circular array (data). At certain point in the array, I need to take a look at the past values (0 = newest data point, n = oldest data point) and determine if there was a value 5% below the current value. I ended up writing a O(n^2) algorithm which works okay, but this doesn't scale.

        const int numberOfDataPointsInPast = 1000;
        int numberOfDataPoints = 0;
        for (int i = numberOfDataPointsInPast; i >= 0; i--)
        {
            double targetPoint = data[i] * 0.95;
            for (int j = i + numberOfDataPoints开发者_开发技巧InPast; j > i; j--)
            {
                if (data[j] <= targetPoint)
                {
                    numberOfDataPoints++;
                    break;
                }
            }
        }

Any idea how I could transform this into a O(n) algo? Thanks!


While iterating the array store the lowest value. This requires to create a min variable and perform a compare check in every step. Instead of comparing all previous values with the new one, compare it only with the lowest.


I think I understand your requirements.... I'm going to restate the problem:

Given: a sliding buffer size K, and a data array of size N > K, indices from 0 to N-1.

Compute: Count the number of points j such K <= j < N-1, and that the set {data[j-1], data[j-2], data[j-3], ... data[j-K]} contains at least one point that has value of <= 0.95 * data[j].

This can be accomplished as follows:

  1. Sort the points {data[0], data[1], ... data[K-1]} using a data structure which has at most O(log N) cost for insertion/removal.

  2. Initialize a counter R to 0, initialize j to K.

  3. Check the sorted array to see if the lowest point is <= data[j] * 0.95; if so, increment R.

  4. Remove data[j-K] from the sorted array, and insert data[j] to the sorted array.

  5. Increment j

  6. If j < N, go back to step 3.

The key here is to choose the proper data structure. I am pretty sure a binary tree would work. If the incremental insertion cost is O(log N) then your total runtime is O(N log N).


I do not think it is possible to do so in O(n) because by solving it with O(n) you can sort it with O(n) and that is not possible. (minimum, for sort is O(nlogn)).

EDIT - reduce sorting to this problem

Suppose one can tell for each point how many points in the past has value smaller than x% (here x is 5 - but x can also be 0 then the count will be any smaller points in the past).

Now - suppose you want to sort an array of n elements.
If you can get the number of smaller points int the past for all elements in O(n) if point a has a greater value than point b the count for point a will also be greater that the count for point b (because the array is circular). So this problem actually yield a function from the values to the count that preserves the order.
Now - the new values are bound between o and n and this can be sorted in time n.

Correct me if I am wrong (It might be that I did not understand the problem in the first place).


You could maintain an array buffArray for numberOfDataPointsInPast elements that will contain current „window” elements sorted in ascending order.

For each iteration:

  • Check if current element is lower than 0.95 * buffArray[0] and perform necessary actions if it is.
  • Remove element that goes out of „window” (i.e. i+numberOfDataPointsInPast’th) from buffArray.
  • Add new element (i.e. i’th) to buffArray maintaining sort order.

This is not O(N) as I understand, but definitely more effective than O(N^2) since adding and removing elements to / from sorted array is O(log N). I suspect that final efficiency is O(N log(W)), where W is numberOfDataPointsInPast.


EDIT:

After thinking about it somemore, an easy O(n) time algorithm is possible, without the need for RMQ or tree (see previous portion of my answer below).

Given an array A[1...n] and and a window width W, you need to find minimum A[i, ...i+W], given i.

For this, you do the following.

Split A[1...n] into contiguous blocks of size W-1. B1, B2, ...B(W-1).

For each block B, maintain two more block called BStart and BEnd.

BStart[i] = minimum of B1, B[2], ..., B[i].

BEnd[i] = minimum of B[W-1], B[W-2], ..., B[W-i].

This can be done in O(W) time for each block, and so O(n) time total.

Now given an i, the sub-array A[i...i+W] will span two consecutive blocks, say B1 and B2.

Find the minimum from i to end of block B1, and start of block B2 to i+w using B1End and B2Start respectively.

This is O(1) time, so total O(n).

For a circular array C[1....n], all you need to do is run the above on

A[1....2n], which is basically two copies of C concatenated together i.e. A[1...n] = C[1...n] and A[n+1 ... 2n] = C[1...n]


Previous writeup.

Ok. Assuming that I have understood your question correctly this time...

It is possible in O(n) time and O(n) space.

In fact it is possible to change your window size to any number you like, have it different for different elements and still have it work!

Given an array A[1...n], it can be preprocessed in O(n) time and O(n) space to answer queries of the form: What is the position of a minimum element in the sub-array A[i...j]? in constant time!

This is called the Range Minimum Query Problem.

So theoretically, it is possible to do in O(n) time.

Just using a tree will give you O(nlogW) time, where W is the size of the window and will probably work much better than RMQ, in practice, as I expect the hidden constants might make the RMQ worse.

You can use a tree as follows.

Start backwards and insert W elements. Find the minimum and push onto a stack. Now delete the first element and insert (W+1)th element. Find the minimum, push on the stack.

Continue this way. Total processing time will be O(nlogW).

At the end you have a stack of minimums, which you can just keep popping off as you now walk the array a second time, this time in the correct order, searching for 0.95*target.

Also, your question is not really clear, you say it is a circular buffer, but you don't seem to be doing a modulus operation with the length. And as coded up, your algorithm is O(n), not O(n^2) as your window size is a constant.


You could take the first numberOfDataPointsInPast in the past sort them, which is nlog(n). Then do a binary search, log(n), find the lowest data point that passes the 5% test. That will tell you how many points out of the numberOfDataPointsInPast will pass the test in nlog(n) time I believe.


The iterations need to start from the bottom and increment (keeping the min of the past). Right now, as posted the algorithm is always looking back, instead of moving forward and remembering the past minimum.

As new points are added, the range of data points can only increase the upper or lower bound. As the lower bound decreases, keeping the lower bound is all that it necessary. Any new points which are more than the lower bound / 0.95 will be acceptable (since the lower bound is always in the past):

const int numberOfDataPointsInPast = 1000; 
int numberOfDataPoints = 0; 
double lb = NAN;
for (int i = 0; i < numberOfDataPointsInPast; i++) 
{ 
    if ( lb == NAN || data[i] < lb ) {
        lb = data[i];
    }
    if ( data[i] >= lb / 0.95 ) {
        numberOfDataPoints++
    }
} 


You have two option:

  1. Sort it -- O(n log n)

  2. Median of Medians algorithm


Try this:

Always maintain two pointers to elements within your buffer. One is the minimum value encountered and the other is the next mimumum (that is the next hightest by increment). Remember these are pointers to the buffer.

At each step in your progression through the buffer, determine if the current value is less than or equal to the value pointed to by min1 or min2, if so, update min1 or min2 to point to the current location. Otherwise if by pointer arithmetic, the value of min1 or min2 is 1500 places back in the buffer, you need to determine which one it is and re-adjust min1 or min2 accordingly, that is min1 points to min2 and min2 is set to point at current location, or min2 is simply set to point to current location.

Whether the value pointed to by either min1 or min2 is less than 15% of the current value can then be determined by simple comparison...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜