开发者

Java - Collections.sort() performance

I'm using Collections.sort() to sort a LinkedList whose elements implements Comparable interface, so they are sorted in a natural order. In the javadoc documentation its said this method uses mergesort algorithm which has n*l开发者_StackOverflow社区og(n) performance.

My question is if there is a more efficient algorithm to sort my LinkedList?

The size of that list could be very high and sort will be also very frequent.


O(N log N) is very good asymptotically. That said, there are linear time O(N) non-comparison based sort, e.g. counting sort and bucket sort. This is useful when, e.g. you're sorting millions and millions of integers, but they're between 1..10.

Also, if the list is "almost sorted", the otherwise quadratic insertion sort is reported to actually be better under some scenarios.

Whether or not this is applicable, or even worth to implement, depends on your profiling results. I'd say that unless it shows the sort to be a bottleneck, don't worry about it.

See also

  • Wikipedia/Counting sort
  • Wikipedia/Bucket sort

Related questions

  • Is there an O(n) integer sorting algorithm?


If you say the list will be sorted "very frequent", you should consider holding the list in a sorted stated all the time, like using a tree instead of a LinkedList. Maybe you can even use some SortedSet instead of a List, if you don't have any duplicated values and don't need any List operations (as you are sorting them anyway all the time). Check the TreeSet class of the SortedSet implementation.

This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

If you want to iterate over this "list" (which is actually a Set) you can use the Iterator of the class.

Returns an iterator over the elements in this set in ascending order.

If you have duplicate values inside the List you have to use some tricks (like putting the value in a new class which also got some delta for sorting equal object)


There is no general sort algorithm better than n*log(n). And this is quite fast. By general I mean your data doesn't have special properties.


I am experimenting with large data sets (GBs of data) and have implemented a merge sort (there is a good example @ googlecode). However, I am using Collection.sort() to pre-sort my temporary buffers and in my experience Collection.sort() gets ridiculously slow at a certain threshold of data. With a auxiliary buffer of 96MB I can sort one of those buffers in about 30sec (note: this heavily depends on the comparators you use - I use a custom column layout with a quite complex column parser), however increasing this to a 128MB chunk size the time jumps to over 3 minutes. This is in no relation to the linear (or near linear) behavior I can observe for smaller chunks. This has so much impact, that a merge sort with smaller buffers in almost (?) all cases faster than a in memory sort using a 128MB buffer. To make this short: Merge sort is the way to go for large data sets beyond the 100MB boundary. I cannot really answer why that is, and those numbers might even be machine dependent (mine is a OS-X on a 2.6GHz i7 & 16GB memory).


In terms of sorting the list, no, all comparison based sorts on general data are O(N log(N)).

If your resorting is due to insertions, then you can try to batch your insertions and then merge sort with the main list - if you have B new items, you sort them in O(B log(B)) then do a single level merge of the two lists which is O(N+B).

If your resorting is due to changes in the values of the items, you might be able to do a similar batching if you change the mutable values into immutable ones and treat the changes to be a batch of insertions and deletions. Otherwise, you won't be able to avoid sorting the whole list.

If your requirements allow it, then there are various non-linked-list structures such as TreeSet available which maintain a sorted order more efficiently, but will fail if the values are mutable.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜