开发者

Algorithm to find a number and its square in an array

I have an array of integers, and I need an O(n) algorithm to find if the array contains a n开发者_开发百科umber and its square; one pair is sufficient.

I tried to do it myself, but I have only managed to find a solution in O(n2).

I thought about using counting sort, but the memory usage is too big.


create a new array twice the length of the input array. O(2N)
copy all of the numbers in O(N)
copy the squares of the numbers in O(N)
radix sort (we can since they are all ints) O(N)
iterate over once to see if there are two numbers the same one after the other O(N)
profit! O(1)


There are basically two ways to do this.

  1. Sort the array and then perform a binary search for the square of each number. Overall complexity would be O(nlogn), but it would need sorting which would destroy the original ordering (which might be important for your case).

  2. Insert all items of the array into a hashtable (or any fast set data structure). Then iterate over the elements of the array again, checking to see if its square exists in the hashtable. Using a hashtable gives an overall complexity of O(n), but you will need O(n) extra space. You could also use a tree-based set (e.g. std::set in C++ or TreeSet in Java), which would give you a complexity of O(nlogn).


If we're allowed to take that the input can be sorted in O(N) by radix sort, I'd improve a bit on Chris's solution:

  • radix sort the input.
  • For the first element of the result, linear search forward until we find either its square (in which case stop with true), or else the end (in which case stop with false) or else a value larger than the square (in which case continue searching for the square of the second and subsequent elements of the sorted array).

Each of the two "pointers" is moving strictly forward, so the overall complexity is O(N), assuming that the radix sort is O(N) and that squaring and comparison are O(1). Presumably whoever set the question intended these assumptions to be made.

In response to a comment by the questioner on another answer: if the integers in the input are not bounded, then I don't think it can be done. Just calculating the square of an integer requires greater than linear time (at least: no linear algorithm for multiplication is known), so consider an input of size n bits, consisting of two integers of size n / 3 bits and 2 * n / 3 bits. Testing whether one is the square of the other cannot be done in O(n). I think. I could be wrong.


While I can't add to the suggestions above, you can reduce the average run time by first finding the min and max values in your data set (both O(n)) and confining your search to that range. For instance if the maximum value is 620, I know that no integer 25 or over has a square in the list.


You may be able to do it with a couple of hashsets helping you out.

While iterating, If the value is in the squares hashset, you've got a pair (value is the square of a previously found value) If the square is in the values hashset, you've got a pair (square of this value was already passed) else store the value in one and the square in the other.


Personally I think that Anon's answer (the little algorithm with 'squares') is more useful than it appears to be: remove the 'remove all less than e from squares' line from it, and the algorithm can handle an unsorted input array.

If we assume the typical Homework machine with Sufficient Space, the 'squares' datastructure could be modelled as an array of boolean flags, yielding true O(1) lookup time.


Without sorting, works with duplicates:

Iterate the array to find the smallest and largest integers. O(n)
Create an array of bits the the size of the difference. O(1) time, O(k) space
(Now each possible integer between the smallest and largest values has a corresponding bit in the array)
Iterate the old-array, setting the bit corresponding to each integer found to 1. O(n)
Iterate the old-array again, checking if the integer's square has its corresponding bit set. O(n)

(Though I didn't sort, this algorithm can be very easily modified to create a sorting algorithm which sorts in O(n+k) time and O(k) space)


If we're using C/C++ 32 bit unsigned ints the maximum value that can be stored is: 4294967295 =(2<<32)-1. The largest number whose square we can store is (1<<16)-1=65535. Now if create an array of bits and store in the array whether we've seen the number and/or its square (2 bits per "slot") we can get the total storage down to 65535/4 = 16384 bytes.

IMO This is not excessive memory consumption so we should be able to do this without radix sorting. An O(N) algorithm could look like this:

uint32_t index(uint32_t i ) { return i/4; }
unsigned char bit1( uint32_t i ) { return 1<<( (i%4)*2 ); }
unsigned char bit2( uint32_t i ) { return 1<<( (i%4)*2 +1 ); }


bool hasValueAndSquare( std::vector<uint32_t> & v )
{
   const uint32_t max_square=65535;

   unsigned char found[(max_square+1)/4]={0};
   for(unsigned int i=0; i<v.size(); ++i)
   {
      if (v[i]<=max_square)
      {
          found[ index(v[i]) ] |= bit1(v[i]);
          if ((found[ index(v[i])] & bit2(v[i])) == bit2(v[i])) return true;
      }
      uint32_t w = (uint32_t)round(sqrt(v[i]));
      if( w*w == v[i] )
      {
          found[ index(w) ] |= bit2(w);
          if ((found[index(w)] & bit1(w)) == bit1(w)) return true;
      }
    }
    return false;
 }

This is not tested, not very optimized, and a proper integer square-root would be better. however the compiler should inline all the bit-accessing functions - so they'll be OK.

Note that if we're using 64 bit ints the memory consumption becomes much larger, instead of an array of 16Kb we need an array of 1Gb - possible less practical.


Optimization notes

Both the hashset and radix sort algorithms can be optimized by noting three facts:

  1. Odd and even values can be handled separately
  2. Calculating an integer square root is a very fast operation (typically consists of 3-5 divides and a few adds)
  3. Cache locality is important for both of these algorithms

The optimized algorithms below will typically perform 5x faster and use less than half the RAM of the unoptimized case. In some cases where the data size is similar to the L2/L3 cache size they may perform 100x faster or more.

Optimized algorithm based on radix sort

Data structure is five lists of integers: IN, Aodd, Bodd, Aeven, Beven The A and B lists use half the integer size of IN. (eg if IN = 64bits, A & B = 32bits)

  1. Scan list IN to find the largest odd and even numbers MAXodd and MAXeven
  2. Let LIMITodd = floor(sqrt(MAXodd))
  3. Let LIMITeven = floor(sqrt(MAXeven))
  4. For each number in list IN: a. Compute the square root if positive. If exact, add the square root to list Aodd/Aeven. b. If the number is >=0 and <= LIMITodd/LIMITeven, add it to list Bodd/Beven
  5. Radix sort list Aodd and Bodd using just log2(LIMITodd) bits
  6. Linear scan Aodd and Bodd for a match
  7. Radix sort list Aeven and Beven using just log2(LIMITeven) bits
  8. Linear scan Aeven and Beven for a match

If either linear scan finds a match, return that match immediately.

The reason this is much faster than the straightforward radix sort algorithm is that:

  • The arrays being sorted typicaly have less than 1/4 the numbers of values and need only half the number of bits per integer, so a total of less than 1/8 the RAM in use in a given sort which is good for the cache.
  • The radix sort is done on much fewer bits leading to fewer passes, so even if it does exceed your L1 or L2 cache you read RAM fewer times, and you read much less RAM
  • The linear scan is typically much faster because the A list contains only exact square roots and the B list only contains small values

Optimized algorithm based on hashset

Data structure is list of integers IN, plus two hashsets A and B The A and B sets use half the integer size of IN

  1. Scan list IN to find the largest odd and even numbers MAXodd and MAXeven
  2. Let LIMITodd = floor(sqrt(MAXodd))
  3. Let LIMITeven = floor(sqrt(MAXeven))
  4. For each odd number in list IN: a. Compute the square root if positive. If exact, check if square root exists in B & return if true otherwise add it to A. b. If the number is >=0 and <= LIMITodd/LIMITeven, check if it exists in A & return if true otherwise add it to B.
  5. Clear A and B and repeat step 4 for even numbers

The reason this is faster than the straightforward hashset algorithm is that:

  • The hashset is typically 1/8 the amount of RAM leading to much better cache performance
  • Only exact squares and small numbers have hashset entries, so much less time is spent hashing and adding/removing values

There is an additional small optimization available here: A and B can be a single hashset, which stores bit flag with each entry to say whether the integer is in A or B (it can't be in both because then the algorithm would have terminated).


If I correctly understand the problem, you have to check if a specified number is in the array. And not finding all the numbers in the array that have their square in the array too. Simply maintain two boolean (one to check if the number has been found, another for the square), iterate over the elements in the array and test each element. Return the AND of the two boolean.

In pseudo code :

bool ArrayContainsNumberAndSquare(int number, int[] array):
boolean numberFound, squareFound;
int square = number * number;
foreach int i in array
(
  numberFound = numberFound || i == number;
  squareFound = squareFound || i == square;
)
return numberFound && squareFound;


1) With the hashmap you get O(n).

2) If you use std::set on 2 sets: the evens, and the odds, you can get

2*O((n/2)log(n/2))=O(nlog(n/2))

assuming there is roughly as many evens than odds


If the array is not sorted, you won't be able to do O(n).

If it is sorted, you can make use of that property to do it in one pass, like so:

foreach e in array
    if squares contains e
        return true
    remove all less than e from squares
    add e * e to squares
return false

Where squares is, say, a HashSet.

If it's not sorted, you can sort it in O(n log n) and then use this method to check for squares, which will still be faster than the naive solution on a large enough data set.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜