How do I compare two voice samples on iOS?

2023-02-22 11:55 问答作者：

First of all I'd like to state that my question is not per say about the "classic" definition of voice recognition.

What we are trying to do is somewhat different, in the sense of:

User records his command
Later, when the user will speak pre-recorded command, a certain action will occur.

For example, I record a voice command for calling my mom, so I click on her and say "Mom". Then when I use the program and say "Mom", it will automatically call her.

How would I perform the comparison of a spoken command to a saved voice sa开发者_如何学Pythonmple?

EDIT: We have no need for any "text-to-speech" abilities, solely a comparison of sound signals. Obviously we're looking for some sort of a off-the-shelf product or framework.

One way this is done for music recognition is to take a time sequence of frequency spectrums (time windowed STFT FFTs) for the two sounds in question, map the locations of the frequency peaks over the time axis, and cross-correlate the two 2D time-frequency peak mappings for a match. This is far more robust than just cross-correlating the 2 sound samples, as the peaks change far less than all the spectral "cruft" between the spectral peaks. This method will work better if the rate of the two utterances and their pitch haven't changed too much.

In iOS 4.x, you can use the Accelerate framework for the FFTs and maybe the 2D cross correlations as well.

Try using a third-party library, like OpenEars for iOS applications. You could have users record a voice sample and save it as translated text, or just let them enter text for recognition.

I think you'd have to perform some sort of cross correlation to determine how similar these two signals are. (Assuming it'll be the same user that is speaking ofcourse). I'm just typing this answer out to see if it helps, but I'd wait for a better answer from someone else though. My signal processing skills are close to zero.

I'm not sure if your question is about the DSP or how to do it on the iPhone. If it is the latter I would start with the Speak Here project that Apple provides. That way you already have the interface to record the voice to a file done. It will save you a lot of trouble.

I'm using Visqol for this purpose. The docs say it works best with a short sample, ideally 5-10 sec.You also need to prepare the files in terms of sample rate and they need to be .wav files. You can easily convert your files to the desired format with ffmpeg library. https://github.com/google/visqol

继续阅读：signal-processing voice-recognition

How do I compare two voice samples on iOS?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？