Identifying (and removing) sequences from a vector in Matlab/Octave
I'm trying to prune any sequence of length 3 or more from a vector of numbers in Matlab (or Octave). For example, given the vector dataSet,
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
removing all sequences of length 3 or more would yield prunedDataSet:
prunedDataSet = [7 9 11 13 22 28 30 31 ];
I can brute force a solution, but I suspect there is a more succinct (and perhaps efficient) way to do it using vector/matrix operations, but I always get confused about whether something yields an index or the value at said index. Suggestions?
Here's the brute force method I came up with:
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
benign = 开发者_JS百科[];
for i = 1:size(dataSet,2)-2;
if (dataSet(i) == (dataSet(i+1)-1) && dataSet(i) == dataSet(i+2)-2);
benign = [benign i ] ;
end;
end;
remove = [];
for i = 1:size(benign,2);
remove = [remove benign(i) benign(i)+1 benign(i)+2 ];
end;
remove = unique(remove);
prunedDataSet = setdiff(dataSet, dataSet(remove));
Here's a solution using DIFF and STRFIND
%# define dataset
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
%# take the difference. Whatever is part of a sequence will have difference 1
dds = diff(dataSet);
%# sequences of 3 lead to two consecutive ones. Sequences of 4 are like two sequences of 3
seqIdx = findstr(dds,[1 1]);
%# remove start, start+1, start+2
dataSet(bsxfun(@plus,seqIdx,[0;1;2])) = []
dataSet =
7 9 11 13 22 28 30 31
Here's an attempt using vector-matrix notation:
s1 = [(dataSet(1:end-1) == dataSet(2:end)-1), false];
s2 = [(dataSet(1:end-2) == dataSet(3:end)-2), false, false];
s3 = s1 & s2;
s = s3 | [false, s3(1:end-1)] | [false, false, s3(1:end-2)];
dataSet(~s)
The idea is: s1
is true for all positions where a number a
appears before a+1
. s2
is true for all positions where a
appears two positions before a+2
. Then s
becomes true where both the previous conditions are met. Then, we build s
such that every true value is propagated to its two successors.
Finally, dataSet(~s)
keeps all the values for which the above conditions are false, that is, it keeps numbers that are not part of a 3-sequence.
精彩评论