What is the alternative to preallocating arrays in MATLAB? [duplicate]

2023-01-07 00:10 问答作者：

This question already has answers here: Closed 12 years ago.

Possible Duplicate:
Growable data structure in MATLAB

So in my current MATLAB script, I have a very large indeterminate-sized growing array. There is currently nothing I can do about it, because if I actually preallocate, it would take many many many times more memory than 开发者_JAVA百科it should need (maximum possible amount of values is 640 per pixel, but usually it's something along the lines of 2-5).

Usually in this case, I'd be using a vector in C++ or something, where it grows exponentially in relation to a given capacity. But I think the matrices in Matlab starts fragmenting much faster than the purpose driven C++ vectors.

What do you guys think is the best alternative to something like this? Or should I just stick with normal arrays and hope that adding around 100k elements sequentially will work?

Thanks in advance.

Growing an array is generally a bad thing to do, at least if you will do it many times. There are a few tricks one can use. (I've tried them all, and written tools that do it for you automatically if you want.)

The problem is that the time associated with the array growth is an O(N^2) operation. It also tends to fragment your memory, leaving you with potential problems if you need to later on create a large array.

One approach is to preallocate your array to some fixed size, then append large blocks of zeros whenever the array size would be exceeded. Now use matrix indexing to insert the new values into the existing array. The idea is to double the size of the array whenever it needs to be reallocated. This causes only a few reallocations, but it may end up appending a lot of zeros that you don't need to fill. At the end, you dump the extraneous and unused array elements.

The second trick is to use a cell array. Every time you want to append a new element or block thereof, you simply append a new cell with those elements. Then at the end, you do a cell2mat operation. A problem with the cell trick is it is still an O(N^2) operation, because the cell array itself is essentially an array of pointers, and that list of pointers must be reallocated when you append new elements.

There is a combination trick you can do that I like, but it needs a tool to implement the operation. The idea is to create cells that contain blocks of say 10000 elements. Fill the first block with zeros. Then use matrix indexing to insert new elements as they are generated. The matrix indexing is fast of course. When you run out of room in that cell, you append a new large cell. Then at the end, you catenate all the cells together, carefully dropping the elements that were unused.

This last scheme involves relatively few new cell elements to be appended, and so can be most efficient overall if there may be millions of append steps.

You can find each of these approaches embodied in several File Exchange submissions I've posted over the years. The first can be found as grow_array, which internally manages the append steps, worrying about the matrix indexing for you.

Later on, I built the last scheme using either function handles or persistent variables to maintain the stored information. The tools that contain those implementations are found as growdata and growdata2.

You could try what std::vector does when reallocating elements --- double its capacity every time it fills up which has an amortized cost of O(1).

Test:

function test_MAT_vector

    LIM = 5e4;
    %%# Normal
    tic;
    A = zeros(1,1);
    for i = 1:LIM
        A(end+1) = i;
    end
    toc

    %%# std::vector clone
    tic;
    B = zeros(1,1);
    for i = 1:LIM
        if(i > numel(B))
            B = [B;zeros(numel(B),1)];
        end
        B(i) = i;
    end
    toc

end

Output:

Elapsed time is 3.489820 seconds.

Elapsed time is 0.006710 seconds.

Using cell

%%# cell
tic;
A = cell(1,1);
for i = 1:LIM
    A(end+1) = {i};
end
toc

Elapsed time is 2.740792 seconds.

In a previous question, I had posted a similar solution to that proposed by @Jacob.

I later compared the performance of most of the options available (which are very well summarized by @woodchips above). Here are the results I got on my machine:

NUM = 50000;

%%# ========== MINE: ~0.07sec ==========
tic
BLOCK_SIZE = 2000;                           %# initial & increment size
listSize = BLOCK_SIZE;                       %# current list capacity
list = zeros(listSize, 2);                   %# actual list
listPtr = 1;                                 %# pointer to last free position
for i=1:NUM
    %# push items on list
    list(listPtr,:) = [rand rand];           %# store new item
    listPtr = listPtr + 1;                   %# increment position pointer
    %# add new block of memory if needed
    if( listPtr+(BLOCK_SIZE/10) > listSize ) %# less than 10%*BLOCK_SIZE free
        listSize = listSize + BLOCK_SIZE;    %# add new BLOCK_SIZE slots
        list(listPtr+1:listSize,:) = 0;
    end
end
list(listPtr:end,:) = [];                    %# remove unused slots
toc

%%# ========== PREALLOCATION (matrix): ~0.05sec ==========
tic
list = zeros(NUM,2);
for i=1:NUM
    list(i,:) = [rand rand];
end
toc

%%# ========== PREALLOCATION (cell): ~1.1sec ==========
tic
list = cell(NUM,1);
for i=1:NUM
    list{i} = [rand rand];
end
list = vertcat( list{:} );
toc

%%# ============ NO-PREALLOCATION (grow cell): ~5sec ========
tic
list = {};
for i=1:NUM
    list{i} = [rand rand];
end
list = vertcat( list{:} );
toc

%%# ============ NO-PREALLOCATION (grow matrix): ~24sec ========
tic
list = [];
for i=1:NUM
    list(i,:) = [rand rand];
end
toc

%%# ========== GROWDATA (by John D'Errico): ~3.3sec =========
tic
growdata                             %# The initialization call
for i = 1:NUM
    growdata( [rand rand] )          %# push items
end
list = growdata;                     %# unpacking step
toc

You can grow a cell array which is much cheaper than an array then use cell2mat to convert to array (requires having twice the memory but if you have it can be the easiest way).

If you know the number of pixels you can preallocate a cell array of the right size, fill each cell with the array of actual values (up to 640 but usually 2-5) then use cell2mat to get convert it to a contigous array.

If you are worried about fragmentation you can do pack after loading the cell which should defragment your memory (rights everything to disk and loads it again contiguously) before you do the cell2mat conversion.

继续阅读：arrays memory optimization

What is the alternative to preallocating arrays in MATLAB? [duplicate]

Test:

Output:

Using cell

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Test:

Output:

Using cell

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？