How should I store my large MATLAB data files during analysis?

2023-02-05 19:30 问答作者：

I am having issues with 'data overload' while processing point cloud data in MATLAB. This is what I am currently doing:

I begin with my raw data files, each in the order of ~30Mb each.
I then do initial processing on them to extract n individual objects and remove outlying points, which are all combined into a 1 x n structure, testset, saved into testset.mat (~100Mb).

So far so good. Now things become complicated:
For each point in each object in testset, I will compute one of a number of features, which ends up being a matrix of some size (for each point). The size of the matrix, and some other properties of the computation, are parameters of the calculations. I save these computed features in a 1 x n cell array, each cell of which contains an array of the matrices for each point.

I then save this cell array in a .mat file, where the name specified the parameters, the name of the test data used and the types of features extracted. For example:

testset_feature_type_A_5x5_0.2x0.2_alpha_3_beta_4.mat
Now for each of these files, I then do some further processing (using a classification algorithm). Again there are more parameters to set.

So now I am in a tricky situation, where each final piece of the initial data has come through some path,开发者_JS百科 but the path taken (and the parameters set along that path) are not intrinsically held with the data itself.

So my question is:

Is there a better way to do this? Can anyone who has experience in working with large datasets in MATLAB suggest a way to store the data and the parameter settings more efficiently, and more integrally?

Ideally, I would be able to look up a certain piece of data without having to use regex on the file strings—but there is also an incentive to keep individually processed files separate to save system memory when loading them in (and to help prevent corruption).

The time taken for each calculation (some ~2 hours) prohibits computing data 'on the fly'.

For a similar problem, I have created a class structure that does the following:

Each object is linked to a raw data file
For each processing step, there is a property
The set method of the properties saves the data to file (in a directory with the same name as the raw data file), stores the file name, and updates a "status" property to indicate that this step is done.
The get method of the properties loads the data if the file name has been stored and the status indicates "done".
Finally, the objects can be saved/loaded, so that I can do some processing now, save the object, later load it and I immediately know how far along the particular data set is in the processing pipeline.

Thus, the only data in memory is the data that is currently being worked on, and you can easily know which data set is at which processing stage. Furthermore, if you set up your methods to accept arrays of objects, you can do very convenient batch processing.

I'm not completely sure if this is what you need, but the save command allows you to store multiple variables inside a single .mat file. If your parameter settings are, for example, stored in an array, then you can save this together with the data set in a single .mat file. Upon loading the file, both the dataset and the array with parameters are restored.

Or do you want to be able to load the parameters without loading the file? Then I would personally opt for the cheap solution of having a second set of files with just the parameters (but similar filenames).

继续阅读：data-structures

How should I store my large MATLAB data files during analysis?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？