Trouble in Matlab Preallocating with a Complex For Loop
I am fairly new to matlab but for my job I need to import an ENORMOUS data set and organize it in a certain way. I have written a code that will do this, but very ineffieciently (it is only my third major piece of code and it takes several hours). Matlab is telling me that I can preallocate my variables (about fifty times in fact) but I am having trouble seeing how to do that because I am not sure what matrix the data will be added to for each iteration in the for loop. The code itself probably explains this better than I do.
(This is just a small piece of it, but will hopefully display my problem)for x= 1:length(firstinSeq)
for y= 1:length(littledataPassed-1)
if firstinSeq(x,1)== littledataPassed(y,1) && firstinSeq(x,2)== littledataPassed(y,2)
switch firstinSeq(x,3)
case 0
for z= 0:1000
w= y+z;
if firstinSeq(x,4)== littledataPassed(w,4)
if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
msgLength0= [msgLength0; firstinSeq(x,:) littledataPassed(w,:)];
break
else continue
end
else msgLength0= [msgLength0; firstinSeq(x,:) [0 0 0 0 0 0]];
break
end
开发者_Go百科 end
case 1
for z= 0:1000
w= y+z;
if firstinSeq(x,4)== littledataPassed(w,4) %if sequence not the same, terminate
if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
msgLength1= [msgLength1; firstinSeq(x,:) littledataPassed(w,:)];
break
else continue
end
else msgLength1= [msgLength1; firstinSeq(x,:) [0 0 1 0 0 0]];
break
end
end
case 2
for z= 0:1000
w= y+z;
if firstinSeq(x,4)== littledataPassed(w,4)
if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)];
break
else continue
end
else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 0 0]];
break
end
end
for z= 0:1000
w= y+z;
if firstinSeq(x,4)== littledataPassed(w,4)
if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 1
msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)];
break
else continue
end
else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 1 0]];
break
end
end
any thoughts on how I could preallocate these variables(msgLength0,1,2,etc)? They do not have data added for every value in the loop and I am uncertain of the end size for each run. There are a total of eight cases for my switch right now, making this program very slow.
If I read your code correctly then one of the variables msgLengthN
is extended for each trip through the innermost loop ? If so, that prompts the thought that you might want to pre-allocate an array called msgLengthAll
and populate that as you go, making sure that there is a value in each entry to distinguish between 0, 1, 2, etc.
If you don't know up front how much space to allocate for msgLengthAll
then you could either:
- Scan the input file once to determine how big this, and other arrays, need to be. There's no disgrace in reading large files more than once to process them and it might save you a lot of time. OR
- Indulge in some fancy allocation scheme whereby initially you make a guess about how much space
msgLengthAll
will need, then, when it gets full, allocate more memory. There is a variety of ways of deciding how much more to allocate at each expansion point: a fixed size or possibly as much as there is already allocated (ie double the allocation at each expansion). This is, of course, potentially quite complicated.
Are you reading the file line-by-line and updating in-memory variables as you go ? Or are you reading the whole file, then sorting things out in memory ? How big is ENORMOUS ? How much RAM do you have ?
You can vectorize the processing in each switch case by finding the indices of the records within the 1000 element block that meet your criteria and then appending them to msgLength0
in one fell swoop. The following is a vectorized version of the case 0
code:
indexStop = find(firstinSeq(x,4) != littledataPassed(y:y+1000,4), 1, 'first');
if isempty(indexStop)
indexStop = 1000;
end
indexProcess = find(littledataPassed(y:y+indexStop,6) == 1 & ...
littledataPassed(y:y+indexStop,2) == firstinSeq(x,2) & ...
littledataPassed(y:y+indexStop,5) == 0);
msgLength0 = [msgLength0; firstinSeq(x,:) littledataPassed(y+indexProcess-1,:); [0 0 0 0 0 0]];
Vectorizing the outer loops would do a lot as well to reduce the execution time. I don't know enough about your data to suggest a specific approach but perhaps using the reshape and/or repmat functions to create arrays that you can operate on vectorally may be the way to go.
精彩评论