removing duplicates - ** only when the duplicates occur in sequence
I wo开发者_StackOverflowuld like to do something similar to the following, except I would only like to remove 'g' and'g' because they are the duplicates that occur one after each other. I would also like to keep the sequence the same.
Any help would be appreciated!!!
I have this cell array in MATLAB:
y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}
ans =
'd' 'f' 'a' 'w' 'a' 'h'
There was an error in my first answer (below) when used on multiple duplicates (thanks grantnz). Here's an updated version:
>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h' 'i' 'i' 'j'}; >> i = find(diff(char(y)) == 0); >> y([i; i+1]) = [] y = 'd' 'f' 'a' 'w' 'a' 'j'
OLD ANSWER
If your "cell vector" always contains only single character elements you can do the following:
>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'} y = 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' >> y(find(diff(char(y)) == 0) + [0 1]) = [] y = 'd' 'f' 'a' 'w' 'a' 'h'
Look at it like this: you want to keep an element if and only if either (1) it's the first element or (2) its predecessor is different from it and either (3) it's the last element or (4) its successor is different from it. So:
y([true ~strcmp(y(1:(end-1)),y(2:end))] & [~strcmp(y(1:(end-1)),y(2:end)) true])
or, perhaps better,
different = ~strcmp(y(1:(end-1)),y(2:end));
result = y([true different] & [different true]);
This should work:
y([ diff([y{:}]) ~= 0 true])
or slightly more compactly
y(diff([y{:}]) == 0) = []
Correction : The above wont remove both the duplicates
ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []
BTW, this works even if there are multiple duplicate sequences
eg,
y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h'};
ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []
y =
'd' 'f' 'a' 'w' 'a'
精彩评论