开发者

Is it faster to add to a collection then sort it, or add to a sorted collection?

If I have a Map like this:

HashMap<Integer, ComparableObject> map;

and I want to obtain a collection of values sorted using natural ordering, which method is fastest?

(A)

Create an instance of a sortable colle开发者_StackOverflowction like ArrayList, add the values, then sort it:

List<ComparableObject> sortedCollection = new ArrayList<ComparableObject>(map.values());
Collections.sort(sortedCollection);

(B)

Create an instance of an ordered collection like TreeSet, then add the values:

Set<ComparableObject> sortedCollection = new TreeSet<ComparableObject>(map.values());

Note that the resulting collection is never modified, so the sorting only needs to take place once.


TreeSet has a log(n) time complexity guarantuee for add()/remove()/contains() methods. Sorting an ArrayList takes n*log(n) operations, but add()/get() takes only 1 operation.

So if you're mainly retrieving, and don't sort often, ArrayList is the better choice. If you sort often but dont retrieve that much TreeSet would be a better choice.


Theoretically, sorting at the end should be faster. Maintaining sorted state through the process could involve additional CPU time.

From the CS points of view, both operations are NlogN, but 1 sort should have lower constant.


Why not use the best of both worlds? If you are never using it again, sort using a TreeSet and initialize an ArrayList with the contents

List<ComparableObject> sortedCollection = 
    new ArrayList<ComparableObject>( 
          new TreeSet<ComparableObject>(map.values()));

EDIT:

I have created a benchmark (you can access it at pastebin.com/5pyPMJav) to test the three approaches (ArrayList + Collections.sort, TreeSet and my best of both worlds approach) and mine always wins. The test file creates a map with 10000 elements, the values of which have an intentionally awful comparator, and then each of the three strategies get a chance to a) sort the data and b) iterate over it. Here is some sample output (you can test it yourselves):

EDIT: I have added an aspect that logs calls to Thingy.compareTo(Thingy) and I have also added a new Strategy based on PriorityQueues that is much faster than either of the previous solutions (at least in sorting).

compareTo() calls:123490
Transformer ArrayListTransformer
    Creation: 255885873 ns (0.255885873 seconds) 
    Iteration: 2582591 ns (0.002582591 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer TreeSetTransformer
    Creation: 199893004 ns (0.199893004 seconds) 
    Iteration: 4848242 ns (0.004848242 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer BestOfBothWorldsTransformer
    Creation: 216952504 ns (0.216952504 seconds) 
    Iteration: 1604604 ns (0.001604604 seconds) 
    Item count: 10000

compareTo() calls:18819
Transformer PriorityQueueTransformer
    Creation: 35119198 ns (0.035119198 seconds) 
    Iteration: 2803639 ns (0.002803639 seconds) 
    Item count: 10000

Strangely, my approach performs best in iteration (I would have thought there would be no differences to the ArrayList approach in iteration, do I have a bug in my benchmark?)

Disclaimer: I know this is probably an awful benchmark, but it helps get the point across to you and I certainly did not manipulate it to make my approach win.

(The code has a dependency to apache commons / lang for the equals / hashcode / compareTo builders, but it should be easy to refactor it out)


Be sure to read my comment about TreeSet at the bottom if you choose to implement B)

If your app only does occasional sorts but iterates through it a lot, I'd say you're best off using a straightforward unsorted list. Sort it the once and then benefit from faster iteration. Iteration is especially fast on an array list.

However if you want sort order to be guaranteed all of the time or you are possibly adding / removing elements frequently then use a sorted collection and take the hit on iteration.

So in your case I would say A) is the better option. The list is sorted once, doesn't change and therefore benefits from being an array. Iteration should be very fast, especially if you know its an ArrayList and can directly use the ArrayList.get() instead of an Iterator.

I'd also add that TreeSet by definition is a Set which means objects are unique. A TreeSet determines equality by using compareTo on your Comparator / Comparable. You could easily find yourself missing data if you try to add two objects whose compareTo returns a value of 0. e.g. adding "C", "A", "B", "A" to a TreeSet will return "A", "B", "C"


Collections.sort uses mergeSort which has O(nlog n).

TreeSet has Red-Black tree underlying, basic operations has O(logn). Hence n elements has also O(nlog n).

So both are same big O algorithm.


Inserting in a SortedSet is O(log(n)) (BUT! the current n and not the final n). Inserting in a List is 1.

Sorting in a SortedSet is already included in inserting, so it is 0. Sorting in a List is O(n*log(n)).

So SortedSet total complexity is O(n * k), k < log(n) for all cases but the last. Instead, List total complexity is O(n * log(n) + n), so O(n * log(n)).

So, SortedSet mathematically has the best performance. But in the end, you have a Set instead of a List (because SortedList doesn't exist) and Set provides you fewer features than List. So in my opinion, the best solution for available features and performance is the one proposed by Sean Patrick Floyd:

  • use a SortedSet for inserting,
  • put the SortedSet as a parameter for creating a List to return.


Great question and great answers. Just thought I would add some points to take into account:

  1. If your Collection to be sorted is short-lived, for instance, used as an argument to a method, and you need the list sorted within the method, then use Collections.sort(collection). Or if it is long-lived object, but you need to sort it very rarely.

Justification: The sorted collection is required for something specific, and you probably won't add or remove very often. So you don't really care about the elements in the collection once it is sorted. You basically:

sort -> use it -> forget

If you add a new element to the sorted collection, you will have to sort the collection again, as the order is not guaranteed when inserting a new element.

  1. If your Collection to be sorted is long-lived and/or if it is a field within a class and you need it to be sorted at all times then you should use a sorted data structure such as TreeSet.

Justification: You care about the collection order at all times. You want it to be sorted at all times. So if you constantly add or remove elements you have the guarantee that the collection is sorted. So basically:

insert/remove -> use it (all the time you have the guarantee that the collection is sorted)

There is no specific moment where you need the collection to be sorted, instead, you want the collection to be sorted all the time.

The downside of using TreeSet is the resources it requires to keep the sorted collection. It uses a Red-black tree, and it requires O(log n) time cost for get, put operations.

Whereas if you use a simple collection, such as an ArrayList, the get,add operations are O(1) constant time.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜