MATLAB matrix multiplication vs for loop for each column
When multiplying two matrices, I tried the following two options:
1)
res = X*A;
2)
for i = 1:size(A,2)
res(:,i) = X*A(:,i);
end
I preallocated memory for res in both. And surprisingly, I found option 2 to be faster.
Can someone explain how this is so?
edit: I tried
K=10000;
clear t1 t2
t1=zeros(K,1);
t2=zeros(K,1);
for k=1:K
cle开发者_高级运维ar res
x = rand(100,100);
a = rand(100,100);
tic
res = x*a;
t1(k) = toc;
end
for k=1:K
clear res2
res2 = zeros(100,100);
x = rand(100,100);
a = rand(100,100);
tic
for i = 1:100
res2(:,i) = x*a(:,i);
end
t2(k) = toc;
end
I run both codes in a loop 1000 times. In average (but not always) the first vectorized code was 3-4 times faster. I cleared the result variables and preallocated before starting timer.
x = rand(100,100);
a = rand(100,100);
K=1000;
clear t1 t2
t1=zeros(K,1);
t2=zeros(K,1);
for k=1:K
clear res
tic
res = x*a;
t1(k) = toc;
end
for k=1:K
clear res2
res2 = zeros(100,100);
tic
for i = 1:100
res2(:,i) = x*a(:,i);
end
t2(k) = toc;
end
So, never make a timing conclusion based on a single run.
I believe I can chime in on the variation in timings between the two methods, as well as why people are getting different relative speeds.
Before Matlab version 2008a (or a version near that release), for loops took a major hit in any Matlab code because the interpreter (a layer between the very readable script and a lower level implementation of the code) would have to re-interpret the code each time through the for loop.
Since that release, the interpreter has gotten progressively better so, when running a modern version of Matlab, the interpreter can look at your code and say "Ah ha! I know what he is doing, let me optimize it just a bit" and avoid the hit it would otherwise take by reinterpreting the code.
I would expect the two ways of performing matrix multiplies to evaluate in the same amount of time, why the for loop implementation runs faster is because of some detail in the optimizations of the interpreter that us mere mortals are not privy to know.
One broad lesson we should take from this, is not all versions are equal. I do work on a couple of bleeding edge cases using two Matlab add ons, the SimBiology and the Parallel Computing Toolboxes, both of which (especially if you want them to work together) are version dependent in speed of execution, and from time to time other stability issues. As such, I keep the three most recent releases of Matlab, will test that I get the same answers out of each version, and I'll occasionally roll back to an earlier version if I find issues with some features. This is probably overkill for most people, but gives you an idea of version differences.
Hope this helps.
Edits:
To clarify, code vectorization is still important. But given a script like:
x_slow = zeros(1,1e5);
x_fast = zeros(1,1e5);
tic;
for i=1:1e5
x_slow(i) = log(i);
end
time_slow = toc; % evaluates for me in .0132 seconds
tic;
x_fast = log(1:1e5);
time_fast = toc; % evaluates for me in .0055 seconds
The disparity between time_slow and time_fast has reduced in the past several versions based on improvements in the interpreter. The example I saw I believe was on 2000a vs. 2008b, but that's subject to my recollection.
There is something else that might be going on that was addressed by Oli and Yuk. There is often a difference between the time_1 and time_2 in:
tic; x = log(1:1e5); time_1 = toc
tic; x = log(1:1e5); time_2 = toc
So the test of one million evaluations vs. one evaluation is valuable, depending on where in memory x is (in cache or no).
Hope this helps again.
This may well be an effect of caching. a
is already in the cache by the time you do the second version, so it has an advantage. Try creating an independent set of inputs to make it fair. Also, it's probably better to measure the time of e.g. 1 million iterations of this, in order to eliminate typical variations due to outside effects.
It looks to me that you are not multiplying matrix properly, you need to sum all the products from ith row of X matrix and jth column of A matrix, that might be a reason. Look here to see how it's done.
精彩评论