Eliminate duplicates from a binary matrix. Can it be done in time better than O(n^2)

2023-03-12 08:01 问答作者：

Input [0 1 0 0 1, 1 0 1 1 0, 0 1 0 0 1, 1 1 1 0 0]

Expected output [0 1 0 0 1, 1 0 1 1 0, 1 1 1 0 0]

Solution I could come up with was:

For each row, convert them into decimal (or use some checksum method), takes O开发者_JS百科(n)
This essentially converts the matrix into 1-dimensional array.
Now use hash table, scan through all the elements
Keep track of duplicates and report only unique elements from this array.

Other solutions may include using a TRIE (or a similar structure). But that would still take O(n^2)

Is there a better solution ?

You can do it in linear time by computing a hash of each row, BucketSorting the hashes (fastest integer sort ever devised), and then removing duplicates from the sorted row (for each row, you compare the current row with the previous, and if it matches, delete the current).

EDIT: Because everyone got downvoted, apparently by someone who doesn't understand that iterating N items is linear regardless of how they're arranged, I will elaborate.

Big-O calculation does not take into account how a collection is arranged in memory, UNLESS the storage mechanism doesn't allow for effectively constant retrieval time. Arrays, no matter how many dimensions, are considered effectively constant to retrieve from. So, we should consider going through the entire 5x5 matrix as a linear operation, because it essentially performs the same as if you'd been given a one-dimensional array of 25 objects.

With that out of the way:

Hashing all elements, taken five at a time, is linear, because we need to read every element exactly once to add them to that row's hash (which can be as simple as multiplying each element by 10^x or 2^x and adding to a running total).
The BucketSort algorithm executes in X*M time for a one-dimensional array of X elements with maximum order of magnitude M. As X in this case is the square root of our total N for the overall operation, and the worst-case maximum order of magnitude M would also be the square root of N, our BucketSort will execute in O(X*M) ~= O(N) worst-case.
Iterating through the sorted hashes is linear, on the order of the square root of our total N.

So, the total complexity of this algorithm, executed on a matrix of N values, is roughly O(2N+sqrt(N)) which is considered O(N).

Why dont you store the binary values inside an integer (like you would a bitfield) then sort the integers using either quick or merge sort. Then iterate through the sorted list checking for duplicates. Duplicate values will always be directly next to each other since it is sorted. This would take O(n log n +n) where n is the amount of rows in your matrix. however each operation would be incredibly fast because it would be made up of comparisons, swaps, and equality checks of an integer which is very fast on modern hardware.

继续阅读：algorithm

Eliminate duplicates from a binary matrix. Can it be done in time better than O(n^2)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？