开发者

Trouble in Matlab Preallocating with a Complex For Loop

I am fairly new to matlab but for my job I need to import an ENORMOUS data set and organize it in a certain way. I have written a code that will do this, but very ineffieciently (it is only my third major piece of code and it takes several hours). Matlab is telling me that I can preallocate my variables (about fifty times in fact) but I am having trouble seeing how to do that because I am not sure what matrix the data will be added to for each iteration in the for loop. The code itself probably explains this better than I do.

(This is just a small piece of it, but will hopefully display my problem)

for x= 1:length(firstinSeq)
            for y= 1:length(littledataPassed-1)
                if firstinSeq(x,1)== littledataPassed(y,1) && firstinSeq(x,2)== littledataPassed(y,2) 
                        switch firstinSeq(x,3)
                            case 0
                                for z= 0:1000
                                    w= y+z;  
                                    if firstinSeq(x,4)== littledataPassed(w,4) 
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0 
                                            msgLength0= [msgLength0; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength0= [msgLength0; firstinSeq(x,:) [0 0 0 0 0 0]];  
                                        break
                                    end
                             开发者_Go百科   end
                            case 1
                                for z= 0:1000
                                    w= y+z; 
                                    if firstinSeq(x,4)== littledataPassed(w,4) %if sequence not the same, terminate
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
                                            msgLength1= [msgLength1; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength1= [msgLength1; firstinSeq(x,:) [0 0 1 0 0 0]]; 
                                        break        
                                    end
                                end
                            case 2
                                for z= 0:1000
                                    w= y+z;
                                    if firstinSeq(x,4)== littledataPassed(w,4)
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
                                            msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 0 0]];
                                        break
                                    end
                                end
                                for z= 0:1000
                                    w= y+z;
                                    if firstinSeq(x,4)== littledataPassed(w,4)
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 1
                                            msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 1 0]];  
                                        break
                                    end
                                end

any thoughts on how I could preallocate these variables(msgLength0,1,2,etc)? They do not have data added for every value in the loop and I am uncertain of the end size for each run. There are a total of eight cases for my switch right now, making this program very slow.


If I read your code correctly then one of the variables msgLengthN is extended for each trip through the innermost loop ? If so, that prompts the thought that you might want to pre-allocate an array called msgLengthAll and populate that as you go, making sure that there is a value in each entry to distinguish between 0, 1, 2, etc.

If you don't know up front how much space to allocate for msgLengthAll then you could either:

  • Scan the input file once to determine how big this, and other arrays, need to be. There's no disgrace in reading large files more than once to process them and it might save you a lot of time. OR
  • Indulge in some fancy allocation scheme whereby initially you make a guess about how much space msgLengthAll will need, then, when it gets full, allocate more memory. There is a variety of ways of deciding how much more to allocate at each expansion point: a fixed size or possibly as much as there is already allocated (ie double the allocation at each expansion). This is, of course, potentially quite complicated.

Are you reading the file line-by-line and updating in-memory variables as you go ? Or are you reading the whole file, then sorting things out in memory ? How big is ENORMOUS ? How much RAM do you have ?


You can vectorize the processing in each switch case by finding the indices of the records within the 1000 element block that meet your criteria and then appending them to msgLength0 in one fell swoop. The following is a vectorized version of the case 0 code:

indexStop = find(firstinSeq(x,4) != littledataPassed(y:y+1000,4), 1, 'first');
if isempty(indexStop)
   indexStop = 1000;
end
indexProcess = find(littledataPassed(y:y+indexStop,6) == 1 & ...
   littledataPassed(y:y+indexStop,2) == firstinSeq(x,2) & ...
   littledataPassed(y:y+indexStop,5) == 0);
msgLength0 = [msgLength0; firstinSeq(x,:) littledataPassed(y+indexProcess-1,:); [0 0 0 0 0 0]];

Vectorizing the outer loops would do a lot as well to reduce the execution time. I don't know enough about your data to suggest a specific approach but perhaps using the reshape and/or repmat functions to create arrays that you can operate on vectorally may be the way to go.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜