Efficient Rolling Window Identification in Matlab

2023-02-11 08:26 问答作者：

I have time-series data with some missing data for which I am running some estimation function over rolling windows. The windows are not uniform length and each variable has differing start and end dates. I want to remove any windows which have any missing data. The windows overlap, so a single missing observation will often remove a great many windows from consideration. What I want is a mapping from each date into the windows which contain it.

Currently, I have a logical matrix which has a row for every possible day and then each column represents one of the windows with true values for the days of that window. I can then subset that matrix to rows representing the missing data, and whichever columns contain any true values are the invalid windows. The problem is that the logical matrix gets large (10k x 10k ~100mb) and there can be开发者_StackOverflow社区 many. I can convert to sparse which solves the size problem, but the calculation of which windows to remove becomes very slow when the windows are long.

This doesn't smell like a problem that should be resource intensive (either memory or computation), is there a better way?

Edit: Let me add an example so this may be a little clearer. Say the full set of dates range from 1 to 100. Windows are 1:10, 2:11, 3:12 and so on through to 91:100 (these are uniform, but it doesn't matter for the example). I have a series that runs from 5 to 25, but has NaN for 17.

That one NaN knocks out ten windows (8:17 through to 17:26). I want an efficient mapping from observation 17 to windows 8:17. Clearly, it's pretty easy when the windows are uniform length, but what is an efficient method when the windows are irregular?

Logical comparisons cost little time and resources. Are you really sure that this is a bottleneck?

If the window creation is time-consuming, you may want do it in a while-loop, so that you can skip a few entries if you encounter a NaN inside a window.

If window creation is fast, which it would certainly be with uniform lengths, you can simply do

%# startEnd is created according to your example, but can be whatever quick method
startEnd = [(1:(100-windowSize+1))',(windowSize+1:100)'];
nanIdx = find(isnan(data))';  %'#

%# This line temporarily creates a logical array of size nWindows-by-numberOfNaNs
%# which is most likely smaller than nWindows-by-nWindows
badWindows = any(bsxfun(@le,startEnd(:,1),nanIdx) & bsxfun(@ge,startEnd(:,2),nanIdx),2);

startEnd(badWindows,:) = [];

I coded up Jonas' solution and it ended up being slower than what I already had in place. It did remove the need for the large array, however, and it got me thinking about the problem differently. I just needed to go from a window -> (obs start, obs end) mapping to an obs -> (indices of each window that obs falls into) approach.

So I build a cell-array that contains each set of window indices for each day (I could use a NumObsx2 matrix, but I want to allow for possibly complicated window definition). For each timeseries, I use the indices of each missing data point to subset the mapping to get the indices of all the windows that need to be removed. Then cell2mat pulls the indices out of the cell-array and I can remove the bad windows (Matlab doesn't care about repeated indices in assignments, thankfully).

In my timings, this method is about 10x my original method and 15x than Jonas' method. The indices in the map can be stored as uint16, so the memory required is far less than my orignal solution (but still more than Jonas' method).

As a bonus, I can use accumarray to count up the indices if I want to use more complicated criteria such as no more than 5 NaNs (cf no more than one as described).

Efficient Rolling Window Identification in Matlab

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

Infinite gtk warnings when I right click on the icon

Best solution for private video database [closed]

国内夏季避暑旅游胜地有哪些？