Detecting periodic repetitions in the data stream
Let's say I have an array of zeros:
a = numpy.zeros(1000)
I then introduce some repetitive 'events':
a[range(0, 1000, 30)] = 1
Question is, how do I detect the 'signal' there? Because it's far from the ideal signal if I do the 'regular' FFT I don't get a clear indication of where my 'true' signal is:
f = abs(numpy.fft.rfft(a))
Is there a method to detect these repetiti开发者_运维技巧ons with some degree of certainty? Especially if I have few of those mixed in, for example here:
a[range(0, 1000, 30)] = 1
a[range(0, 1000, 110)] = 1
a[range(0, 1000, 48)] = 1
I'd like to get three 'spikes' on the resulting data...
Have you considered using autocorrelation?
As an analytical technique acf/pacf/ccf are used to identify periodicity in time-dependent signals, hence the correlogram, the graphical display of acf or pacf, displays self-similarity in a signal as a function of different lags. (So for instance, if you see values on the y axis peak at a lag of 12, and if your date is in months, that's evidence of annual periodicity.)
To calculate and plot 'similarity' versus lag, if you don't want to roll your own, i am not aware of a native Numpy/Scipy option; I also couldn't find one in the 'time series' scikit (one of the libraries in the Scipy 'Scikits', domain-specific modules not included in the standard Scipy distribution) but it's worth checking again. The other option is to install Python bindings to R (RPy2, available on SourceForge) which will allow you to access the relevant R functions, including 'acf' which will calculate and plot the correlogram just by passing in your time series and calling the function.
On the other hand, if you want to identify continuous (unbroken) streams of a given type in your signal, then "run-length encoding" is probably what you want:
import numpy as NP
signal = NP.array([3,3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0,0,0,7,7,7,7,7,4,4,1,1,1,1,1,1,1])
px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) ]
# returns [(0, 9, 3), (9, 19, 0), (19, 24, 7), (24, 26, 4), (26, 33, 1)]
# so w/r/t first 3-tuple: '3' occurs continuously in the half-open interval 0 and 9, and so forth
精彩评论