What is the fastest way to load data in Matlab
I have a vast quantity of data (>800Mb) that takes an age to load into Matlab mainly because it's split up into tiny files each <20kB. They are all in a prop开发者_如何学JAVArietary format which I can read and load into Matlab, its just that it takes so long.
I am thinking of reading the data in and writing it out to some sort of binary file which should make it quicker for subsequent reads (of which there may be many, hence me needing a speed-up).
So, my question is, what would be the best format to write them to disk to make reading them back again as quick as possible?
I guess I have the option of writing using fwrite, or just saving the variables from matlab. I think I'd prefer the fwrite option so if needed, I could read them from another package/language...
Look in to the HDF5 data format, used by recent versions of MATLAB as the underlying format for .mat files. You can manually create your own HDF5 files using the hdf5write
function, and this file can be accessed from any language that has HDF bindings (most common languages do, or at least offer a way to integrate C code that can call the HDF5 library).
If your data is numeric (and of the same datatype), you might find it hard to beat the performance of plain binary (fwrite).
Binary mat-files are the fastest. Just use
save myfile.mat <var_a> <var_b> ...
I achieved an amazing speed up in loading when I used the '-v6' option to save the .mat files like so:
save(matlabTrainingFile, 'Xtrain', 'ytrain', '-v6');
Here's the size of the matrices that I used in my test ...
Attr Name Size Bytes Class
==== ==== ==== ===== =====
g Xtest 1430x4000 45760000 double
g Xtrain 3411x4000 109152000 double
g Xval 1370x4000 43840000 double
g ytest 1430x1 11440 double
g ytrain 3411x1 27288 double
g yval 1370x1 10960 double
... and the performance improvements that we achieved:
Before the change:
time to load the training data: 78 SECONDS!!!
time to load validation data: 32
time to load the test data: 35
After the change:
time to load the training data: 0 SECONDS!!!
time to load validation data: 0
time to load the test data: 0
Apparently the reason the reason this works so well is that the old version 6 version used less compression the than new versions. So your file sizes will be bigger, but they will load WAY faster.
精彩评论