Python find audio frequency and amplitude over time
Here is what I would like to do. I would like to find the audio frequency and amplitude of a .wav file at every say 1ms of that .wav file and save it into a file. I have graphed frequency vs amplitude and have graphed amplitude over time but I cannot figure out frequency overtime. My end goal is to be able to read the file and use them amplitude to adjust variables and the frequency to trigger which variabl开发者_高级运维es are being used, that seems to be the easy part. I have been using numpy, audiolab, matplotlib, etc... using FFT's but I just cannot figure this one out, any help is appreciated! Thank You!
Use a STFT with overlapping windows to estimate the spectrogram. To save yourself the trouble of rolling your own, you can use the specgram method of Matplotlib's mlab. It's important to use a small enough window for which the audio is approximately stationary, and the buffer size should be a power of 2 to efficiently use a common radix-2 fft. 512 samples (about 10.67 ms at 48 ksps; or 93.75 Hz per bin) should suffice. For a sampling rate of 48 ksps, overlap by 464 samples to evaluate a sliding window at every 1 ms (i.e. shift by 48 samples).
Edit:
Here's an example that uses mlab.specgram
on an 8-second signal that has 1 tone per second from 2 kHz up to 16 kHz. Note the response at the transients. I've zoomed in at 4 seconds to show the response in more detail. The frequency shifts at precisely 4 seconds, but it takes a buffer length (512 samples; approx +/- 5 ms) for the transient to pass. This illustrates the kind of spectral/temporal smearing caused by non-stationary transitions as they pass through the buffer. Additionally, you can see that even when the signal is stationary there's the problem of spectral leakage caused by windowing the data. A Hamming window function was used to minimize the side lobes of the leakage, but this also widens the main lobe.
import numpy as np
from matplotlib import mlab, pyplot
#Python 2.x:
#from __future__ import division
Fs = 48000
N = 512
f = np.arange(1, 9) * 2000
t = np.arange(8 * Fs) / Fs
x = np.empty(t.shape)
for i in range(8):
x[i*Fs:(i+1)*Fs] = np.cos(2*np.pi * f[i] * t[i*Fs:(i+1)*Fs])
w = np.hamming(N)
ov = N - Fs // 1000 # e.g. 512 - 48000 // 1000 == 464
Pxx, freqs, bins = mlab.specgram(x, NFFT=N, Fs=Fs, window=w,
noverlap=ov)
#plot the spectrogram in dB
Pxx_dB = np.log10(Pxx)
pyplot.subplots_adjust(hspace=0.4)
pyplot.subplot(211)
ex1 = bins[0], bins[-1], freqs[0], freqs[-1]
pyplot.imshow(np.flipud(Pxx_dB), extent=ex1)
pyplot.axis('auto')
pyplot.axis(ex1)
pyplot.xlabel('time (s)')
pyplot.ylabel('freq (Hz)')
#zoom in at t=4s to show transient
pyplot.subplot(212)
n1, n2 = int(3.991/8*len(bins)), int(4.009/8*len(bins))
ex2 = bins[n1], bins[n2], freqs[0], freqs[-1]
pyplot.imshow(np.flipud(Pxx_dB[:,n1:n2]), extent=ex2)
pyplot.axis('auto')
pyplot.axis(ex2)
pyplot.xlabel('time (s)')
pyplot.ylabel('freq (Hz)')
pyplot.show()
精彩评论