Finding the min max of a stock chart
Are there any specific algorithms that will allow me to find the min and max points in the picture above?
I have data in text format so I don't need to find it in the picture. The problem with stocks is that they have so many local mins and maxes simple derivatives won't work.
I am thinking of using digital filters (z domain), and smoothing out the graph, but I am still left with开发者_JAVA百科 too many localized minimums and maximums.
I also tried to use a moving average as well to smooth out the graph, but again I have too many maxes and mins.
EDIT:
I read some of the comments and I just didn't circle some of the minimums and and maximums by accident.
I think I came up with an algorithm that may work. First find the minimum and maximum points (High of the day and low of the day). Then draw three lines one from open to high or low whichever comes first then a line from low to high or high to low and finally to close. Then in each of these three regions find the point that is furthest points from the line as my high and low and then repeat loop.
I usually use a combination of Moving Average and Exponential Moving Average. It proved (empirically) to be well fitted for the task (enough for my needs, at least). The results are tuned with only two parameters. Here is a sample:
Edit
In case it is useful for somebody, here is my Mathematica code:
f[sym_] := Module[{l},
(*get data*)
l = FinancialData[sym, "Jan. 1, 2010"][[All, 2]];
(*perform averages*)
l1 = ExponentialMovingAverage[MovingAverage[l, 10], .2];
(*calculate ma and min positions in the averaged list*)
l2 = {#[[1]], l1[[#[[1]]]]} & /@
MapIndexed[If[#1[[1]] < #1[[2]] > #1[[3]], #2, Sequence @@ {}] &,
Partition[l1, 3, 1]];
l3 = {#[[1]], l1[[#[[1]]]]} & /@
MapIndexed[If[#1[[1]] > #1[[2]] < #1[[3]], #2, Sequence @@ {}] &,
Partition[l1, 3, 1]];
(*correlate with max and mins positions in the original list*)
maxs = First /@ (Ordering[-l[[#[[1]] ;; #[[2]]]]] + #[[1]] -
1 & /@ ({4 + #[[1]] - 5, 4 + #[[1]] + 5} & /@ l2));
mins = Last /@ (Ordering[-l[[#[[1]] ;; #[[2]]]]] + #[[1]] -
1 & /@ ({4 + #[[1]] - 5, 4 + #[[1]] + 5} & /@ l3));
(*Show the plots*)
Show[{
ListPlot[l, Joined -> True, PlotRange -> All,
PlotLabel ->
Style[Framed[sym], 16, Blue, Background -> Lighter[Yellow]]],
ListLinePlot[ExponentialMovingAverage[MovingAverage[l, 10], .2]],
ListPlot[{#, l[[#]]} & /@ maxs,
PlotStyle -> Directive[PointSize[Large], Red]],
ListPlot[{#, l[[#]]} & /@ mins,
PlotStyle -> Directive[PointSize[Large], Black]]},
ImageSize -> 400]
]
You'll notice that a-lot of the answers go for derivatives with some sort of low-pass filtering. A moving average of some kind, if you will. Both the fft, the square-window moving average and the exponential moving average are all pretty similar on a fundamental level. However, given the choice over all moving averages, which is the best one?
The answer: The Gaussian moving average; That of the normal distribution, of which you are aware.
The reason: The Gaussian filter is the only filter that will never produce a "fake" maximum; A maximum that was not there to begin with. This has been theoretically proven for both continuous and discrete data (Make sure you use the discrete Gaussian for discrete data though!). As you increase the Gaussian sigma, the local maxima and minima will merge, in a most intuitive way. Thus if you want there to be no more than one local maximum per day, you set sigma to one, ET cetera.
I don't know what you mean by "simple derivatives". I understand it to mean you have tested a gradient descent and found it unsatisfactory because of the abundance of local extrema. If so, you want to look at simulated annealing:
Annealing is a metallurgical process used to temper metals through a heating and cooling treatment. (...). These irregularities are due to atoms being stuck in the wrong place of the structure. In the process of annealing, the metal is heated up and then allowed to cool down slowly. Heating up gives the atoms the energy they need to get un-stuck, and the slow cool-down period allows them to move to their correct location in the structure.
(...) However, in order to escape local optima, the algorithm will have a probability of taking a step in a bad direction: in other words, of taking a step that increases the value for a minimization problem or that decreases the value for a maximization problem. To simulate the annealing process, this probability will depend in part on a "temperature" parameter in the algorithm, which is initialized at a high value and decreased at each iteration. Consequently, the algorithm will initially have a high probability of moving away from a nearby (likely local) optimum. Over the iterations that probability will decrease and the algorithm will converge on the (hopefully global) optimum it did not have the chance to escape from. (source, cuts &, emphasis mine)
I know that local optima is precisely what the circles in your drawing represent, above, and hence what you want to find. But, as I interpret the quote "so many local mins and maxes simple derivatives won't work.", this is also precisely what you find too much of. I assume you have trouble with all the "zig-zag" you curve makes between two circled points.
All that seems to differentiate the optima you circle from the rest of the points of the curve is their globality, precisely: to find a lower point than the first point you circle on the left you have to go further away either way in the x coordinate than you need to do the same for its close neighbors. That's what annealing gives you: depending on the temperature parameter, you can control the size of the jumps you allow yourself to make. There has to be a value for which you catch the "big" local optima, and yet miss the "small" ones. What I'm suggesting isn't revolutionary: there are several examples (e.g. 1 2) where people have obtained nice results from such noisy data.
Simply define what you mean by minimum and maximum in a precise, but tunable way, and then tune it until it finds the right number of minima and maxima. For example, you can first smooth the graph by replacing each value with the average of that value and the N values left and right of it. By increasing N, you can reduce the number of minima and maxima you find.
You can then define a minimum as a point where if you skip A values left and right, the next B values all show a consistent increasing trend. By increasing B, you can find fewer minima and maxima. By adjusting A, you can tune how 'flat' a minimum or maximum is allowed to be.
Once you use a tunable algorithm, you can just tune it until it looks right.
You can use Spline method to create a contenous approximation polynom to your original function [with the desired degree]. After you have this polynom, look for a local minimum/maximum [using basic calculus] on it [the generated polynom].
Note that spline method gives you an approximation polynom that is both 'smooth' - so it is easy to find local min/maxs, and both as close as possible to the original function, and thus the local min/max should be very close to the true value, in the original function.
To improve accuracy, after finding the local mins/max in the generated polynom, for each x0
which represent a local min/max, you should look in all x such that x0-delta < x < x0 + delta
, to find the real min/max this point represents.
I have often found that the extrema perceived subjectively by humans (read: the only extrema in stock charts, which are mostly random noise) are often to be found after Fourier bandpass filtering. You might try this algorithm:
- Perform an FFT
- Do a bandpass in frequency space. Select the bandpass' parameters based on the range of data you want your extrema to look good on, that is, the timescale of interest.
- Perform an inverse FFT.
- Select the local maxima of the resulting curve.
The parameters of the second step seem pretty subjective, but again, subjectiveness is the very nature of stock chart analysis.
Here is a converted @Dr. belisarius's code to python:
import pandas as pd
import numpy as np
def emasma_maxmin(l: pd.Series, sma_period=10, ema_factor=.2):
"""
mathematica's ema(sma(sma_period), ema_factor) minmax in python
:param l: prices as pd.Series
:param sma_period
:param ema_factor
:return: a tuple that contains max and min index lists
"""
l1 = l.rolling(window=sma_period).mean().ewm(com=int(1/ema_factor)-1).mean().dropna()
l1_triples = [w.to_list() for w in l1.rolling(window=3) if len(w.to_list()) == 3]
# utilize the fact that original indexes are preserved in pd.Series
l1_indexes = l1.index[:-2]
# index is from ordinal l that corresponds to the first element of the triple
l2 = [index for (index, (fst, snd, trd)) in zip(l1_indexes, l1_triples) if fst < snd > trd]
l3 = [index for (index, (fst, snd, trd)) in zip(l1_indexes, l1_triples) if fst > snd < trd]
max_ranges = [(-np.array([l[index] for index in range(index - sma_period + 1, index + 1)])).argsort() + (index - sma_period + 1) for index in l2]
min_ranges = [(-np.array([l[index] for index in range(index - sma_period + 1, index + 1)])).argsort() + (index - sma_period + 1) for index in l3]
return (
[r[0] for r in max_ranges if len(r) > 0],
[r[-1] for r in min_ranges if len(r) > 0],
)
You may visualize the results via something like (using mplfinance):
import pandas as pd
import numpy as np
import mplfinance as mpf
# read OHLC csv
msft = pd.read_csv("MSFT1440.csv", parse_dates=True)
df = msft.loc[msft['date'] >= '2021-01-01']
max_indexes, _ = emasma_maxmin(df.high)
_, min_indexes = emasma_maxmin(df.low)
maxs, mins = set(max_indexes), set(min_indexes)
# color minmax candlesticks
mco = ['green' if index in maxs else 'red' if index in mins else None
for index in df.index.values]
# generate vertical lines at minmax
vlines = [pd.to_datetime(str(df['date'][index]))
for index in df.index.values if index in max_indexes or index in min_indexes]
vline_colors = ['green' if index in maxs else 'red'
for index in df.index.values if index in max_indexes or index in min_indexes]
df.set_index(pd.DatetimeIndex(df['date']), inplace=True)
mpf.plot(df,
type='candle',
marketcolor_overrides=mco,
vlines=dict(vlines=vlines, linewidths=0.25, colors=vline_colors))
which produces:
As mentioned by belisarius, the best method seems to involve filtering the data smooth. With enough smoothing, looking for changes in the slope should pinpoint the local mins and maxes (the derivative will help here). I would use a centered sliding window for a running median/average, or an ongoing EMA (or similar IIR filter).
This python code detects local extremums in range 5. df should contain OHLC columns
df['H_5'], df['L_5'] = df['H'].shift(-5), df['L'].shift(-5)
df['MAXH5'] = df['H'].rolling(window=5).max()
df['MINL5'] = df['L'].rolling(window=5).min()
df['MAXH_5'] = df['H_5'].rolling(window=5).max()
df['MINL_5'] = df['L_5'].rolling(window=5).min()
df.eval(" maximum5 = (MAXH5==H) & (MAXH_5==H) ")
df.eval(" minimum5 = (MINL5==L) & (MINL_5==L) ")
df.eval(" is_extremum_range5 = maximum5 | minimum5 ")
The result is in column is_extremum_range5
= {True| False}
Fermat's Theorem will help you find the local minima and maxima.
精彩评论