Java Collections containsAll Weired Behavior
I have following code , where I am using superList and subList , I want to check that subList is actually a subList of superList.
My objects do not implement hashCode or equals methods. I have created the similar situation in the test. When I run the test then the result show very big performance difference between results from JDK collection and common collections.After Running the test I am getting following output.
Time Lapsed with Java Collection API 8953 MilliSeconds & Result is true Time Lapsed with Commons Collection API 78 MilliSeconds & Result is true
My question is why is java collection , so slow in processing the containsAll operation. Am I doing something wrong there? I have no control over collection Types I am getting that from legacy code. I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.
package com.mycompany.tests;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import org.apache.commons.collections.CollectionUtils;
import org.junit.Before;
import org.junit.Test;
public class CollectionComparison_UnitTest {
private Collection<MyClass> superList = new ArrayList<MyClass>();
private Collection<MyClass> subList = new HashSet<MyClass>(50000);
@Before
public void setUp() throws Exception {
for (int i = 0; i < 50000; i++) {
MyClass myClass = new MyClass(i + "A String&开发者_如何转开发quot;);
superList.add(myClass);
subList.add(myClass);
}
@Test
public void testIt() {
long startTime = System.currentTimeMillis();
boolean isSubList = superList.containsAll(subList);
System.out.println("Time Lapsed with Java Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
startTime = System.currentTimeMillis();
isSubList = CollectionUtils.isSubCollection(subList, superList);
System.out.println("Time Lapsed with Commons Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
}
}
class MyClass {
String myString;
MyClass(String myString) {
this.myString = myString;
}
String getMyString() {
return myString;
}
}
Different algorithms:
ArrayList.containsAll()
offers O(N*N), while CollectionUtils.isSubCollection()
offers O(N+N+N).
ArrayList.containsAll
is inherited from AbstractCollection.containsAll
and is a simple loop checking all elements in row. Each step is a slow linear search. I don't know how CollectionUtils
works, but it's not hard to do it much faster then using the simple loop. Converting the second List to a HashSet
is a sure win. Sorting both lists and going through them in parallel could be even better.
EDIT:
The CollectionUtils source code makes it clear. They're converting both collections to "cardinality maps", which is a simple and general way for many operations. In some cases it may not be a good idea, e.g., when the first list is empty or very short, you in fact loose time. In you case it's a huge win in comparison to AbstractCollection.containsAll, but you could do even better.
Addendum years later
The OP wrote
I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.
and that's wrong. Classes without hashCode
and equals
inherit them from Object
and can be used with a HashSet
and everything works perfectly. Except for that each object is unique, which may be unintended and surprising, but the OP's test superList.containsAll(subList)
does exactly the same thing.
So the quick solutions would be
new HashSet<>(superList).containsAll(subList)
You should at least try the tests in the opposite order. Your results may very well just show that the JIT compiler is doing its job well :-)
精彩评论