开发者

Rolling median in python

I have some stock data based on daily close va开发者_如何学编程lues. I need to be able to insert these values into a python list and get a median for the last 30 closes. Is there a python library that does this?


In pure Python, having your data in a Python list a, you could do

median = sum(sorted(a[-30:])[14:16]) / 2.0

(This assumes a has at least 30 items.)

Using the NumPy package, you could use

median = numpy.median(a[-30:])


Have you considered pandas? It is based on numpy and can automatically associate timestamps with your data, and discards any unknown dates as long as you fill it with numpy.nan. It also offers some rather powerful graphing via matplotlib.

Basically it was designed for financial analysis in python.


isn't the median just the middle value in a sorted range?

so, assuming your list is stock_data:

last_thirty = stock_data[-30:]
median = sorted(last_thirty)[15]

Now you just need to get the off-by-one errors found and fixed and also handle the case of stock_data being less than 30 elements...

let us try that here a bit:

def rolling_median(data, window):
    if len(data) < window:
       subject = data[:]
    else:
       subject = data[-30:]
    return sorted(subject)[len(subject)/2]


#found this helpful:

list=[10,20,30,40,50]

med=[]
j=0
for x in list:
    sub_set=list[0:j+1]
    median = np.median(sub_set)
    med.append(median)    
    j+=1
print(med)


Here is a much faster method with w*|x| space complexity.

def moving_median(x, w):
    shifted = np.zeros((len(x)+w-1, w))
    shifted[:,:] = np.nan
    for idx in range(w-1):
        shifted[idx:-w+idx+1, idx] = x
    shifted[idx+1:, idx+1] = x
    # print(shifted)
    medians = np.median(shifted, axis=1)
    for idx in range(w-1):
        medians[idx] = np.median(shifted[idx, :idx+1])
        medians[-idx-1] = np.median(shifted[-idx-1, -idx-1:])
    return medians[(w-1)//2:-(w-1)//2]

moving_median(np.arange(10), 4)
# Output
array([0.5, 1. , 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8. ])

The output has the same length as the input vector. Rows with less than one entry will be ignored and with half of them nans (happens only for an even window-width), only the first option will be returned. Here is the shifted_matrix from above with the respective median values:

[[ 0. nan nan nan] -> -
 [ 1.  0. nan nan] -> 0.5
 [ 2.  1.  0. nan] -> 1.0
 [ 3.  2.  1.  0.] -> 1.5
 [ 4.  3.  2.  1.] -> 2.5
 [ 5.  4.  3.  2.] -> 3.5
 [ 6.  5.  4.  3.] -> 4.5
 [ 7.  6.  5.  4.] -> 5.5
 [ 8.  7.  6.  5.] -> 6.5
 [ 9.  8.  7.  6.] -> 7.5
 [nan  9.  8.  7.] -> 8.0
 [nan nan  9.  8.] -> -
 [nan nan nan  9.]]-> -

The behaviour can be changed by adapting the final slice medians[(w-1)//2:-(w-1)//2].

Benchmark:

%%timeit
moving_median(np.arange(1000), 4)
# 267 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Alternative approach: (the results will be shifted)

def moving_median_list(x, w):
    medians = np.zeros(len(x))
    for j in range(len(x)):
        medians[j] = np.median(x[j:j+w])
    return medians

%%timeit
moving_median_list(np.arange(1000), 4)
# 15.7 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Both algorithms have a linear time complexity. Therefore, the function moving_median will be the faster option.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜