开发者

Java Matrix processing time

I need sim开发者_JAVA技巧ple opinion from all Guru!

I developed a program to do some matrix calculations. It work all fine with small matrix. However when I start calculating BIG thousands column row matrix. It kills the speed.

I was thinking to do processing on each row and write the result in a file then free the memory and start processing 2nd row and write in a file, so and so forth.

Will it help in improving speed? I have to make big changes to implement this change. Thats why I need your opinion. What do you think?

Thanks

P.S: I know about colt and Jama matrix. I can not use these packages due to company rules.


Edited

In my program I am storing all the matrix in 2 dimensional array and if matrix is small it is fine. However, when it has thousands column and rows. Then storing all this matrix in memory for calculation cause performance issues. Matrix contains floating values. For processing I read all the matrix store in memory then start calculation. After calculating I write the result in a file.


Is memory really your bottleneck? Because if it isn't, then stopping to write things out to a file is always going to be much, much slower than the alternative. It sounds like you are probably experiencing some limitation of your algorithm.

Perhaps you should consider optimising the algorithm first.

And as I always say for all performance issue - asking people is one thing, but there is no substitute for trying it! Opinions don't matter if the real-world performance is measurable.


I suggest using profiling tools and timing statements in your code to work out exactly where your performance problem is before your start making changes.

You could spend a long time 'fixing' something that isn't the problem. I suspect that the file IO you suggest would actually slow your code down.

If your code effectively has a loop nested within another loop to process each element then you will see your speed drop away quickly as you increase the size of the matrix. If so, an area to look at would be processing your data in parallel, allowing your code to take advantage of multiple CPUs/cores.

Consider a more efficient implementation of a sparse matrix data structure and not a multidimensional array (if you are using one now)


You need to remember that perfoming an NxN multipled by an NxN takes 2xN^3 calculations. Even so it shouldn't take hours. You should get an inprovement by transposing the second matrix (about 30%) but it really shouldn't be taking hours.

So as you 2x N you increase the time by 8x. Worse than that a matrix which fit into your cache is very fast but mroe than a few MB and they have to come from main memory which slows down your operations by another 2-5x.

Putting the data on disk will really slow down your calaculation, I only suggest you do this if you martix doesn't fit in memory, but it will make it 10x - 100x slower so buying a little more memory is a good idea. (In your case your matrixies should be small enough to fit into memory)

I tried Jama, which is a very basic library which use two dimensional arrays instead of one and on 4 year old labtop took 7 minutes. You should be able to get half this time by just using the latest hardware and with multiple threads cut this below one minute.

EDIT: Using a Xeon X5570, Jama multiplied two 5000x5000 matrices in 156 seconds. Using a parallel implementation I wrote, cut this time to 27 seconds.


Use the profiler in jvisualvm in the JDK to identify where the time is spent.

I would do some simple experiments to identify how your algoritm scales, because it sounds like you might use one that has a higher runtime complexity than you think. If it runs in N^3 (which is common if you want to multiply a list with an array) then doubling the input size will eight-double the run time.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜