Identify important minima and maxima in time-series w/ Mathematica
I need a way to identify local minima and maxima in time series data with Mathematica. This seems like it should be an easy thing to do, but it gets tricky. I posted this on the MathForum, but thought I might get some additional eyes on it here.
You can find a paper that discusses the problem at: http://www.cs.cmu.edu/~eugene/research/full/compress-series.pdf
I've tried this so far…
Get and format some data:
data = FinancialData["SPY", {"May 1, 2006", "Jan. 21, 2011"}][[All, 2]];
data = data/First@data;
data = Transpose[{Range[Length@data], data}];
Define 2 functions:
First method:
findMinimaMaxima[data_, window_] := With[{k = window},
data[[k + Flatten@Position[Partition[data[[All, 2]], 2 k + 1, 1], x_List /; x[[k + 1]] < Min[Delete[x, k + 1]] || x[[k + 1]] > Max[Delete[x, k + 1]]]]]]
Now another approach, although not as flexible:
findMinimaMaxima2[data_] := data[[Accumulate@(Length[#] & /@ Split[Prepend[Sign[Rest@data[[All, 2]] - Most@data[[All, 2]]], 0]])]]
Look at what each the functions does. First findMinimaMaxima2[]:
minmax = findMinimaMaxima2[data];
{Length@data, Length@minmax}
ListLinePlot@minmax
This selects all minima and maxima and results (in this instance) in about a 49% data compression, but it doesn't have the flexibility of expanding the window. This other method does. A window of 2, yields fewer and arguably more important extrema:
minmax2 = findMinimaMaxima[data, 2];
{Length@data, Length@minmax2}
ListLinePlot@minmax2
But look at what happens when we expand the window to 60:
minmax2 = findMinimaMaxima[data, 60];
ListLinePlot[{data, minmax2}]
Some of the minima and maxima no longer alternate. Applying findMinimaMaxima2[] to the output of findMinimaMaxima[] gives a workaround...
minmax3 = findMinimaMaxima2[minmax2];
ListLinePlot[{data, minmax2, minmax3}]
, but this seems like a clumsy way to address the problem.
So, the idea of using a fixed window to look left and right doesn't quite do everything one would like. I began thinking about an alternative that could use a range value R (e.g. a percent move up or down) that the function would need to meet or exceed to set the next minima or maxima. Here's my first try:
findMinimaMaxima3[data_, R_] := Module[{d, n, positions},
d = data[[All, 2]];
n = Transpose[{data[[All, 1]], Rest@FoldList[If[(#2 <= #1 + #1*R && #2 >= #1) || (#2 >= #1 - #1* R && #2 <= #1), #1, #2] &, d[[1]], d]}];
n = Sign[Rest@n[[All, 2]] - Most@n[[All, 2]]];
positions = Flatten@Rest[Most[Position[n, Except[0]]]];
data[[positions]]
]
minmax4 = findMinimaMaxima3[data, 0.1];
ListLinePlot[{data, minmax4}]
This too benefits from post processing with findMinimaMaxima2[]
ListLinePlot[{data, findMinimaMaxima2[minmax4]}]
But if you look closely, you see that it misses the extremes if they go beyond the R value in several positions - including the chart's absolute minimum and maximum as well as along the big moves up and down. Changing the R value shows how it misses the top and bottoms even more:
minmax4 = findMinimaMaxima3[data, 0.15];
ListLinePlot[{data, minmax4}]
So, I need to reconsider. Anyone can look at a plot of the data and easily identify the important minima and maxima. It seems harder to get an algorithm to do it. A window and/or an R value seem important to the solution, but neither on their own seems enough (at least not in the approaches above)开发者_JAVA技巧.
Can anyone extend any of the approaches shown or suggest an alternative to identifying the important minima and maxima?
Happy to forward a notebook with all of this code and discussion in it. Let me know if anyone needs it.
Thank you, Jagra
I suggest to use an iterative approach. The following functions are taken from this post, and while they can be written more concisely without Compile, they'll do the job:
localMinPositionsC =
Compile[{{pts, _Real, 1}},
Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0},
For[i = 2, i < Length[pts], i++,
If[pts[[i - 1]] > pts[[i]] && pts[[i + 1]] > pts[[i]],
result[[++ctr]] = i]];
Take[result, ctr]]];
localMaxPositionsC =
Compile[{{pts, _Real, 1}},
Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0},
For[i = 2, i < Length[pts], i++,
If[pts[[i - 1]] < pts[[i]] && pts[[i + 1]] < pts[[i]],
result[[++ctr]] = i]];
Take[result, ctr]]];
Here is your data plot:
dplot = ListLinePlot[data]
Here we plot the mins, which are obtained after 3 iterations:
mins = ListPlot[Nest[#[[localMinPositionsC[#[[All, 2]]]]] &, data, 3],
PlotStyle -> Directive[PointSize[0.015], Red]]
The same for maxima:
maxs = ListPlot[Nest[#[[localMaxPositionsC[#[[All, 2]]]]] &, data, 3],
PlotStyle -> Directive[PointSize[0.015], Green]]
And the resulting plot:
Show[{dplot, mins, maxs}]
You may vary the number of iterations, to get more coarse-grained or finer minima/maxima.
Edit:
actually, I just noticed that a couple of points were still missed by this method, both for the minima and maxima. So, I suggest it as a starting point, not as a complete solution. Perhaps, you could analyze minima/maxima, coming from different iterations, and sometimes include those from a "previous", more fine-grained one. Also, the only "physical reason" that this kind of works, is that the nature of the financial data appears to be fractal-like, with several distinctly different scales. Each iteration in the above Nest-s targets a particular scale. This would not work so well for an arbitrary signal.
精彩评论