What is the best way to store a 16 × (2^20) matrix in MATLAB?

2022-12-31 16:46 问答作者：

I am thinking of writing the data to a file. Does anyone have an example of how to write a big amount of data to a file?

Edit: Most elements in the matrix are zeroes, others are uin开发者_C百科t32. I guess the simplest save() and load() would work, as @Jonas suggested.

I guess nobody's seen the edit about the zeroes :)

If they're mostly zeroes, you should convert your matrix to its sparse representation and then save it. You can do that with the sparse function.

Code

z = zeros(10000,10000);
z(123,456) = 1;
whos z
z = sparse(z);
whos z

Output

Name          Size                   Bytes  Class     Attributes

  z         10000x10000            800000000  double  

Name          Size               Bytes  Class     Attributes

  z         10000x10000            40016  double    sparse

I don't think the sparse implementation is designed to handle uint32.

If you're concerned with keeping the size of the data file as small as possible, here are some suggestions:

Write the data to a binary file (i.e. using FWRITE) instead of to a text file (i.e. using FPRINTF).
If your data contains all integer values, convert it to or save it as a signed or unsigned integer type instead of the default double precision type MATLAB uses.
If your data contains floating point values, but you don't need the range or resolution of the default double precision type, convert it to or save it as a single precision type.
If your data is sufficiently sparse (i.e. there are many more zeroes than non-zeroes in your matrix), then you can use the FIND function to get the row and column indices of the non-zero values, then just save these to your file.

Here are a couple of examples to illustrate:

data = double(rand(16,2^20) <= 0.00001);  %# A large but very sparse matrix

%# Writing the values as type double:
fid = fopen('data_double.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');     %# Write the matrix size (2 values)
fwrite(fid,data,'double');           %# Write the data as type double
fclose(fid);                         %# Close the file

%# Writing the values as type uint8:
fid = fopen('data_uint8.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');    %# Write the matrix size (2 values)
fwrite(fid,data,'uint8');           %# Write the data as type uint8
fclose(fid);                        %# Close the file

%# Writing out only the non-zero values:
[rowIndex,columnIndex,values] = find(data);  %# Get the row and column indices
                                             %#   and the non-zero values
fid = fopen('data_sparse.dat','w');  %# Open the file
fwrite(fid,numel(values),'uint32');  %# Write the length of the vectors (1 value)
fwrite(fid,rowIndex,'uint32');       %# Write the row indices
fwrite(fid,columnIndex,'uint32');    %# Write the column indices
fwrite(fid,values,'uint8');          %# Write the non-zero values
fclose(fid);                         %# Close the file

The files created above will differ drastically in size. The file 'data_double.dat' will be about 131,073 KB, 'data_uint8.dat' will be about 16,385 KB, and 'data_sparse.dat' will be less than 2 KB.

Note that I also wrote the data\vector sizes to the files so that the data can be read back in (using FREAD) and reshaped properly. Note also that if I did not supply a 'double' or 'uint8' argument to FWRITE, MATLAB would be smart enough to figure out that it didn't need to use the default double precision and would only use 8 bits to write out the data values (since they are all 0 and 1).

How is the data generated? How do you need to access the data?

If I calculate correctly, the variable is less than 200MB if it's all double. Thus, you can easily save and load it as a single .mat file if you need to access it from Matlab only.

%# create data
data = zeros(16,2^20);

%# save data
save('myFile.mat','data');

%# clear data to test everything works
clear data

%# load data
load('myFile.mat')

继续阅读：bigdata file-io matrix

What is the best way to store a 16 × (2^20) matrix in MATLAB?

Code

Output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Code

Output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？