开发者

How can I reduce the execution time of this loop (java)

I have a piece of code which seems to take an unusual amount of time to execute. I'm in need of reducing the execution speed as much as possible.

Essentially, the code does the following. I create an array of objects size[10][10][10]. The object contains a list of numbers as follows:

class MyClass{
  ArrayList<Integer> numberList;
      public MyClass(){
          numberList= new ArrayList <Integer> ();
      }
   }


      MyClass storage[][][] = new MyClass[10][10][10];

I then have the following code which adds numbers to the lists

 for(int i =0; i < 200000;i++){
       for(int j = 0; j < 10; j++){
        for(int k = 0; k < 10; k++){
         for(int l = 0; l &l开发者_如何学编程t; 10; l++){            
                   storage[j][k][l].numberList.add(i);
         }
        }
       }
      }

I'm fairly certain the vast majority of the execution time comes from the following line

 storage[j][k][l].numberList.add(i);

More specifically, it's the .add(i) .

I'm fairly new to Java, and am only familiar in C++. If ArrayList is similar to a list in C++, surely adding an element to the end requires very little CPU time? Is it just because I'm doing so many add operations (probably a million)?

Another thing I want to ask is can I speed this up by using threads? (assuming dual core processor with 4 threads) I'm thinking I could create 4 threads, each processing a 50,000 chunk. However, I'm not sure about synchronization. Presumably I would have to have some mutual exclusion on storage[][][]. Do I need to write

synchronized(storage)

or would this be fine?

synchronized(storage[j][k][l])

Any help greatly appreciated

Regards


Never, ever use the default Java wrapper classes when working with tens of millions of data in memory that could as well be stored as primitives.

This is the surest way to shoot yourself in the foot: been there, done that.

new ArrayList <Integer>

can be trivially replaced with Trove's TIntArrayList:

new TIntArrayList

You download Trove and it's basically a one-line change that will save a lot of memory when doing the kind of things you're doing.

To help put things into perspective:

    final int n = 10000000;

    final List<Integer> l1 = new ArrayList<Integer>( n );
    for (int i = 0; i < n; i++) {
        l1.add( i );
    }

    final TIntArrayList l2 = new TIntArrayList( n );       
    for (int i = 0; i < n; i++) {
        l2.add( i );
    }

The first loop, using the non-sensical default Java wrapper around primitives to hold 10 millions integers takes 4320 milliseconds to execute on my machine.

The second one takes 41 milliseconds.

So that's two order of magnitudes faster.

The naive attitude would be to think: "both are O(1)".

Truth is: both are O(1) but I'm taking the version that runs two orders of magnitude faster any day.


Use the new ArrayList(capacity) constructor. In your case capacity = 200000

If you don't initialize the ArrayList with a predefined capacity, it will have to expand from time to time, which is effectively copying the existing array backing the list into a new, larger one.

If you specify the initial capacity, the ArrayList will be backed by a large-enough array from the beginning, and no copying will occur.

If you use that class in another location, where 200000 is not the desired size, you can call ArrayList.ensureCapacity(capacity) in another loop, which will do the same thing - create an array large enough to hold all the data without copying it over and over.

Finally, I think you should be able to avoid filling that structure completely. If it is really as predictable as in your simplified example, you can fill it on-demand.


First of all, learn to run a profiler against your programs to be SURE where the bottleencks are. One might be the jvisualvm in the Sun 6 JDK.

I believe you are right in assuming the problem is with the add().

I think that the problem is that the ArrayLists() need to grow. Try using the form new ArrayList(200) (or whatever a reasonable final size will be) and measure again.


Basically, you seem to be filling 1000 arrays with the numbers from 0 until 200000. I would fill up one ArrayList, then just use copy constructors to fill up the rest:

class MyClass{
  List<Integer> numberList;
  public MyClass(){ ... }
  // Copy constructor
  public MyClass(ArrayList<Integer> otherList){
    numberList= new ArrayList<Integer>(otherList);
  }
}

MyClass storage[][][] = new MyClass[10][10][10];
List<Integer> prototypeList= new ArrayList<Integer>(200000);

for(int i =0; i < 200000;i++){
  prototypeList.add(i);
}

for(int j = 0; j < 10; j++){
  for(int k = 0; k < 10; k++){
    for(int l = 0; l < 10; l++){            
      storage[j][k][l] = new MyClass(prototypeList);
    }
  }
}

This has the advantage of eliminating the resizing of lists, and working with consecutive chunks of memory (which eliminates memory cache hits - in your original loop, you access different list objects in rapid succession, which surely can not all fit into the same memory cache at once).


James,

Bearing in mind you've provided simplified code;

  1. The class needs to encapsulate the data member behind a method on MyClass. Make "numberList" private and add a method public void add( int i ) {...}. This means the type of numberList can freely change - encapsulation is golden rule # 1 for good OO.

  2. With the data member private, you could change the type of the data type from ArrayList<Integer> numberList; to int [] numberList. I am pretty sure that array accesses will be quicker than ArrayList. Initialise the array in the constructor. Of course this only works if the number of elements will always be 200,000 or some fixed size passed in the constructor. Also, having an array of int's will be far smaller (in RAM) than an ArrayList - each Integer is a full object with the baggage from Object. A smaller RAM footprint would (in your example case) reduce any disk-swapping that may be needed.

  3. Don't worry about unrolling any of the small inner loops, the JVM will do these low-level optimisations automatically, more efficiently and without polluting your code.


LinkedList would be the equivalent to list in C++

I think ArrayList is similiar to a std::vector.


You may want to allocate ArrayList with the expected capacity. This way it will only be allocated once.

In your case:

numberList= new ArrayList <Integer> ( 200000 );
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜