Python Audio Frame Pitch Change
I'm attempting to use pyaudio to make a voice masker. With the way I have it set up right now, the only thing I have to do is input the sound, change the pitch on the fly, and chunk it right back out. The first and last part are working, and I think I'm getting close to changing pitch... emphasis on the "think".
Unfortunately, I'm not too familiar with the type of data I'm working with and how exactly to manipulate it the way I want. I've gone through the audioop documentation and havn't found what I needed (thought there are some things I could definately use in there). I guess what I'm asking is...
How is the data formatted in these audio frames.
How can I change the pitch of a frame (if I can), or is it even close to working like that?
import pyaudio
import sys
import numpy as np
import wave
import audioop
import struct
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 41000
RECORD_SECONDS = 5
p = pyaudio.PyAudio()
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
output = True,
frames_per_bu开发者_高级运维ffer = chunk)
swidth = 2
print "* recording"
while(True):
data = stream.read(chunk)
data = np.array(wave.struct.unpack("%dh"%(len(data)/swidth), data))*2
data = np.fft.rfft(data)
#MANipulation
data = np.fft.irfft(data)
stream.write(data3, chunk)
print "* done"
stream.stop_stream()
stream.close()
p.terminate()
After the irfft
line, and before the stream.write
line, you need to convert the data back into 16-bit integers with a struct.pack
call.
data = np.fft.irfft(data)
dataout = np.array(data*0.5, dtype='int16') #undo the *2 that was done at reading
chunkout = struct.pack("%dh"%(len(dataout)), *list(dataout)) #convert back to 16-bit data
stream.write(chunkout)
To change the pitch, you'll have to perform an FFT on a number of frames and then shift the data in frequency (move the data to different frequency bins) and perform an inverse FFT.
If you don't mind the sound fragment getting longer while lowering the pitch (or higher when increasing the pitch), you could resample the frames. For instance, you could double each frame (insert a copy of each frame in the stream) thereby lowering the playback speed and the pitch. You can then improve the audio quality by improving the resampling algorithm to use some sort of interpolation and/or filtering.
精彩评论