Objective C - Cross-correlation for audio delay estimation
I would like to know if anyone knows how to perform a cross-correlation between two audio signals on iOS.
I would like to align the FFT windows that I get at the receiver (I am receiving the signal from the mic) with the ones at the transmitter (which is playing the audio track), i.e. make sure that the first sample of each window (besides a "sync" period) at the transmitter will also be the first window at the receiver.
I injected in every chunk of the transmitted audio a known waveform (in the frequency domain). I want estimate the delay through cross-correlation between the known waveform and the received signal (o开发者_开发问答ver several consecutive chunks), but I don't know how to do it.
It looks like there is the method vDSP_convD
to do it, but I have no idea how to use it and whether I first have to perform the real FFT of the samples (probably yes, because I have to pass double[]).
void vDSP_convD (
const double __vDSP_signal[],
vDSP_Stride __vDSP_signalStride,
const double __vDSP_filter[],
vDSP_Stride __vDSP_strideFilter,
double __vDSP_result[],
vDSP_Stride __vDSP_strideResult,
vDSP_Length __vDSP_lenResult,
vDSP_Length __vDSP_lenFilter
)
The vDSP_convD()
function calculates the convolution of the two input vectors to produce a result vector. It’s unlikely that you want to convolve in the frequency domain, since you are looking for a time-domain result — though you might, if you have FFTs already for some other reason, choose to multiply them together rather than convolving the time-domain sequences (but in that case, to get your result, you will need to perform an inverse DFT to get back to the time domain again).
Assuming, of course, I understand you correctly.
Then once you have the result from vDSP_convD()
, you would want to look for the highest value, which will tell you where the signals are most strongly correlated. You might also need to cope with the case where the input signal does not contain sufficient of your reference signal, and in that case you may wish to (for example) ignore values in the result vector below a certain level.
Cross-correlation is the solution, yes. But there are many obstacles you need to handle. If you get samples from the audio files, they contain padding which cross-correlation function does not like. It is also very inefficient to perform correlation with all those samples - it takes a huge amount of time. I have made a sample code which demonstrates time shift of two audio files. If you are interested in the sample, look at my Github Project.
精彩评论