Eliminate duplicates from a binary matrix. Can it be done in time better than O(n^2)
Input [0 1 0 0 1, 1 0 1 1 0, 0 1 0 0 1, 1 1 1 0 0]
Expected output [0 1 0 0 1, 1 0 1 1 0, 1 1 1 0 0]
Solution I could come up with was:
- For each row, convert them into decimal (or use some checksum method), takes O开发者_JS百科(n)
- This essentially converts the matrix into 1-dimensional array.
- Now use hash table, scan through all the elements
- Keep track of duplicates and report only unique elements from this array.
Other solutions may include using a TRIE (or a similar structure). But that would still take O(n^2)
Is there a better solution ?
You can do it in linear time by computing a hash of each row, BucketSorting the hashes (fastest integer sort ever devised), and then removing duplicates from the sorted row (for each row, you compare the current row with the previous, and if it matches, delete the current).
EDIT: Because everyone got downvoted, apparently by someone who doesn't understand that iterating N items is linear regardless of how they're arranged, I will elaborate.
Big-O calculation does not take into account how a collection is arranged in memory, UNLESS the storage mechanism doesn't allow for effectively constant retrieval time. Arrays, no matter how many dimensions, are considered effectively constant to retrieve from. So, we should consider going through the entire 5x5 matrix as a linear operation, because it essentially performs the same as if you'd been given a one-dimensional array of 25 objects.
With that out of the way:
Hashing all elements, taken five at a time, is linear, because we need to read every element exactly once to add them to that row's hash (which can be as simple as multiplying each element by 10^x or 2^x and adding to a running total).
The BucketSort algorithm executes in X*M time for a one-dimensional array of X elements with maximum order of magnitude M. As X in this case is the square root of our total N for the overall operation, and the worst-case maximum order of magnitude M would also be the square root of N, our BucketSort will execute in O(X*M) ~= O(N) worst-case.
Iterating through the sorted hashes is linear, on the order of the square root of our total N.
So, the total complexity of this algorithm, executed on a matrix of N values, is roughly O(2N+sqrt(N)) which is considered O(N).
Why dont you store the binary values inside an integer (like you would a bitfield) then sort the integers using either quick or merge sort. Then iterate through the sorted list checking for duplicates. Duplicate values will always be directly next to each other since it is sorted. This would take O(n log n +n) where n is the amount of rows in your matrix. however each operation would be incredibly fast because it would be made up of comparisons, swaps, and equality checks of an integer which is very fast on modern hardware.
精彩评论