Is partitioning easier than sorting?

2023-01-07 02:32 问答作者：

This is a question that's been lingering in my mind for some time ...

Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time. I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items.

One way of doing this is to extend the equivalence to an ordering on the items and ord开发者_开发技巧er them (with a sorting algorithm); then all equivalent items will be adjacent.

But can it be done more efficiently than with sorting? Is the time complexity of this problem lower than that of sorting? If not, why not?

You seem to be asking two different questions at one go here.

1) If allowing only equality checks, does it make partition easier than if we had some ordering? The answer is, no. You require Omega(n^2) comparisons to determine the partitioning in the worst case (all different for instance).

2) If allowing ordering, is partitioning easier than sorting? The answer again is no. This is because of the Element Distinctness Problem. Which says that in order to even determine if all objects are distinct, you require Omega(nlogn) comparisons. Since sorting can be done in O(nlogn) time (and also have Omega(nlogn) lower bounds) and solves the partition problem, asymptotically they are equally hard.

If you pick an arbitrary hash function, equal objects need not have the same hash, in which case you haven't done any useful work by putting them in a hashtable.

Even if you do come up with such a hash (equal objects guaranteed to have the same hash), the time complexity is expected O(n) for good hashes, and worst case is Omega(n^2).

Whether to use hashing or sorting completely depends on other constraints not available in the question.

The other answers also seem to be forgetting that your question is (mainly) about comparing partitioning and sorting!

If you can define a hash function for the items as well as an equivalence relation, then you should be able to do the partition in linear time -- assuming computing the hash is constant time. The hash function must map equivalent items to the same hash value.

Without a hash function, you would have to compare every new item to be inserted into the partitioned lists against the head of each existing list. The efficiency of that strategy depends on how many partitions there will eventually be.

Let's say you have 100 items, and they will eventually be partitioned into 3 lists. Then each item would have to be compared against at most 3 other items before inserting it into one of the lists.

However, if those 100 items would eventually be partitioned into 90 lists (i.e., very few equivalent items), it's a different story. Now your runtime is closer to quadratic than linear.

If you don't care about the final ordering of the equivalence sets, then partitioning into equivalence sets could be quicker. However, it depends on the algorithm and the numbers of elements in each set.

If there are very few items in each set, then you might as well just sort the elements and then find the adjacent equal elements. A good sorting algorithm is O(n log n) for n elements.

If there are a few sets with lots of elements in each then you can take each element, and compare to the existing sets. If it belongs in one of them then add it, otherwise create a new set. This will be O(n*m) where n is the number of elements, and m is the number of equivalence sets, which is less then O(n log n) for large n and small m, but worse as m tends to n.

A combined sorting/partitioning algorithm may be quicker.

If a comparator must be used, then the lower bound is Ω(n log n) comparisons for sorting or partitioning. The reason is all elements must be inspected Ω(n), and a comparator must perform log n comparisons for each element to uniquely identify or place that element in relation to the others (each comparison divides the space in 2, and so for a space of size n, log n comparisons are needed.)

If each element can be associated with a unique key which is derived in constant time, then the lowerbound is Ω(n), for sorting ant partitioning (c.f. RadixSort)

Comparison based sorting generally has a lower bound of O(n log n).

Assume you iterate over your set of items and put them in buckets with items with the same comparative value, for example in a set of lists (say using a hash set). This operation is clearly O(n), even after retreiving the list of lists from the set.

--- EDIT: ---

This of course requires two assumptions:

There exists a constant time hash-algorithm for each element to be partitioned.
The number of buckets does not depend on the amount of input.

Thus, the lower bound of partitioning is O(n).

Partitioning is faster than sorting, in general, because you don't have to compare each element to each potentially-equivalent already-sorted element, you only have to compare it to the already-established keys of your partitioning. Take a close look at radix sort. The first step of radix sort is to partition the input based on some part of the key. Radix sort is O(kN). If your data set has keys bounded by a given length k, you can radix sort it O(n). If your data are comparable and don't have a bounded key, but you choose a bounded key with which to partition the set, the complexity of sorting the set would be O(n log n) and the partitioning would be O(n).

This is a classic problem in data structures, and yes, it is easier than sorting. If you want to also quickly be able to look up which set each element belongs to, what you want is the disjoint set data structure, together with the union-find operation. See here: http://en.wikipedia.org/wiki/Disjoint-set_data_structure

The time required to perform a possibly-imperfect partition using a hash function will be O(n+bucketcount) [not O(n*bucketcount)]. Making the bucket count large enough to avoid all collisions will be expensive, but if the hash function works at all well there should be a small number of distinct values in each bucket. If one can easily generate multiple statistically-independent hash functions, one could take each bucket whose keys don't all match the first one and use another hash function to partition the contents of that bucket.

Assuming a constant number of buckets on each step, the time is going to be O(NlgN), but if one sets the number of buckets to something like sqrt(N), the average number of passes should be O(1) and the work in each pass O(n).

继续阅读：algorithm partitioning sorting time-complexity

Is partitioning easier than sorting?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？