what values should i calculate from FFT of two audio file and compare it to show them they are equal?
I want to compare two audio files(voice recording) and find whether they are identical or not (to some extent).I have come up with FFT(OouraFFT).I have integrated code and gave my audio file as input and "calculateWelchPeriodogramWithNewSignalSegment" is called.There is a term spectrum data used in "calculateWelchPeriodogramWithNewSignalSegment" method.now what should i use to compare two audio files.please anyone explain the concept for using FFT to compare two audio signal(speach signal).Further what should i proceed with?Any valuable information will be more helpful.Thanks in Advance.
EDIT:
MyAudioFile *audioFile = [[MyAudioFile alloc]init];
OSStatus result = [audioFile open:var ofType:@"wav"];
int numFrequencies=16384;
int kNumFFTWindows=10;
OouraFFT *myFFT = [[OouraFFT alloc] initForSignalsOfLength:numFrequencies*2 andNumWindows:kNumFFTWindows];
for(long i=0; i<myFFT.dataLength; i++)
{
myFFT.inputData[i] = (double)audioFile.audioData[i];
}
[myFFT calculateWelchPeriodogramWithNewSignalSegment];
NSLog(@"the spectrum data 1 is %f ",myFFT.spectrumData[1]);
NSLog(@"the spectrum data 2 is %f",myFFT.spectrumData[2]);
NSLog(@"the spectrum data 8192 is %f ",myFFT.spectrumData[8192]);
I have created MyAudioFile class which contains
-(OSStatus)open:(NSString *)fileName ofType:(NSString *)fileType{
OSStatus result = -1;
CFStringRef filePath=fileName;
CFURLRef audioFileURL = CFURLCreateWithFileSystemPath(kCFAllocatorDefault, (CFStringRef)filePath, kCFURLPOSIXPathStyle, false);
//open audio file
result = AudioFileOpenURL (audioFileURL, kAudioFileReadPermission, 0, &mAudioFile);
if (result == noErr) {
//get format info
UInt32 size = sizeof(mASBD);
result = AudioFileGetProperty(mAudioFile, kAudioFilePropertyDataFormat, &size, &mASBD);
UInt32 dataSize = sizeof packetCount;
result = AudioFileGetProperty(mAudioFile, kAudioFilePropertyAudioDataPacketCount, &dataSize, &packetCount);
NSLog([NSString stringWithFormat:@"File Opened, packet Count: %d", packetCount]);
UInt32 packetsRead = packetCount;
UInt32 numBytesRead = -1;
if (packetCount > 0) {
//allocate buffer
audioData = (SInt16*)malloc( 2 *packetCount);
//read the packets
result = AudioFileReadPackets (mAudioFile, false, &numBytesRead, NULL, 0, &packetsRead, audioData);
NSLog([NSString stringWithFormat:@"Read %d bytes, %d packets", numBytesRead, packetsRead]);
}
}
else
NSLog([NSString stringWithFormat:@"Could not open file: %@", filePath]);
CFRelease (audioFileURL);
return result;
}
I think ,now i am done with FFT , myFFT.spectrumData[i] has the sampled output differnt values of i.
Do i want now to stop this and integrate Accelerate framework for doing FFT.I am confused.Please开发者_开发技巧 tell me which one to use?
This is actually a pretty tough problem, but I would say that working in the frequency space is useful. Also, as the author of the OouraFFT library (the ObjC wrapper around Prof. Ooura's pretty old FFT implementation), I would recommend NOT using it if you can instead adopt Apple's Accelerate library. It's much faster, more accurate, and will be actively maintained. My library will not, I've switched entirely to Accelerate for my own work.
Anyhoo, it's useful to work in frequency space, because any small offset in the time-domain will cause you a lot of headaches when working with cross-correlations. If you instead do a short-time fourier transform, you can apply the methods published by the engineers of the Shazam iPhone app, which, at first glance, seems to be robust to this problem. Best of luck, you've got a lot of work ahead of you.
I am not sure that FFT is what you would want to use in this scenario. FFT will provide you with the power spectral density (PSD) of the signal. This means that you will get a plot of signal power verses frequency. Notice there is no time in there. In other words, you would only be able to compare if to signals have the same frequency distribution, but not if there time domain signals are identical. For this I think you would want to use something more along the lines of a Cross-Correlation which measures the similarity of two wave forms over a given time and gives you value of how similar they are. There may be more sophisticated ways of doing this, but this is off the top of my head.
-Eric
You're going to run into problems doing any sort of direct comparison between those two wave files - noise, different voices, etc will all make that difficult. I'd probably try to run a cross-correlation in the frequency spectrum (i.e. after running the FFT), looking for patterns of frequency peaks (since they won't be identical - different people have different pitches of voice and speak at different rates.)
So, to elaborate: Get the magnitude of your FFT (I'm afraid I'm not familiar with OouraFFT, so I'm not sure how the complex values are stored). Run a cross-correlation between the two. If the maximum correlation is greater than some parameter, it's a match.
精彩评论