C/C++/Obj-C Real-time algorithm to ascertain Note (not Pitch) from Vocal Input

2023-01-22 13:18 问答作者：

I want to detect not the pitch, but the pitch class of a sung note.

So, whether it is C4 or C5 is not important: they must both be detected as C.

Imagine the 12 semitones arranged on a clock face, with the needle pointin开发者_运维技巧g to the pitch class. That's what I'm after! ideally I would like to be able to tell whether the sung note is spot-on or slightly off.

This is not a duplicate of previously asked questions, as it introduces the constraints that:

the sound source is a single human voice, hopefully with negligible background interference (although I may need to deal with this)
the octave is not important, only the pitch class

EDIT -- Links:

Real time pitch detection

Using the Apple FFT and Accelerate Framework

See my answer here for getting smooth FREQUENCY detection: https://stackoverflow.com/a/11042551/1457445

As far as snapping this frequency to the nearest note -- here is a method I created for my tuner app:

- (int) snapFreqToMIDI: (float) frequencyy {

    int midiNote = (12*(log10(frequencyy/referenceA)/log10(2)) + 57) + 0.5;
    return midiNote;
}

This will return the MIDI note value (http://www.phys.unsw.edu.au/jw/notes.html)

In order to get a string from this MIDI note value:

- (NSString*) midiToString: (int) midiNote {
    NSArray *noteStrings = [[NSArray alloc] initWithObjects:@"C", @"C#", @"D", @"D#", @"E", @"F", @"F#", @"G", @"G#", @"A", @"A#", @"B", nil];
    return [noteStrings objectAtIndex:midiNote%12];
}

For an example implementation of the pitch detection with output smoothing, look at musicianskit.com/developer.php

Pitch is a human psycho-perceptual phenomena. Peak frequency content is not the same as either pitch or pitch class. FFT and DFT methods will not directly provide pitch, only frequency. Neither will zero crossing measurements work well for human voice sources. Try AMDF, ASDF, autocorrelation or cepstral methods. There are also plenty of academic papers on the subject of pitch estimation.

There is another long list of pitch estimation algorithms here.

Edited addition: Apple's SpeakHere and aurioTouch sample apps (available from their iOS dev center) contain example source code for getting PCM sample blocks from the iPhone's mic.

Most of the frequency detection algorithms cited in other answers don't work well for voice. To see why this is so intuitively, consider that all the vowels in a language can be sung at one particular note. Even though all those vowels have very different frequency content, they would all have to be detected as the same note. Any note detection algorithm for voices must take this into account somehow. Furthermore, human speech and song contains many fricatives, many of which have no implicit pitch in them.

In the generic (non voice case) the feature you are looking for is called the chroma feature and there is a fairly large body of work on the subject. It is equivalently known as the harmonic pitch class profile. The original reference paper on the concept is Tayuka Fujishima's "Real-Time Chord Recognition of Musical Sound: A System Using Common Lisp Music". The Wikipedia entry has an overview of a more modern variant of the algorithm. There are a bunch of free papers and MATLAB implementations of chroma feature detection.

However, since you are focusing on the human voice only, and since the human voice naturally contains tons of overtones, what you are practically looking for in this specific scenario is a fundamental frequency detection algorithm, or f0 detection algorithm. There are several such algorithms explicitly tuned for voice. Also, here is a widely cited algorithm that works on multiple voices at once. You'd then check the detected frequency against the equal-tempered scale and then find the closest match.

Since I suspect that you're trying to build a pitch detector and/or corrector a la Autotune, you may want to use M. Morise's excellent WORLD implementation, which permits fast and good quality detection and modification of f0 on voice streams.

Lastly, be aware that there are only a few vocal pitch detectors that work well within the vocal fry register. Almost all of them, including WORLD, fail on vocal fry as well as very low voices. A number of papers refer to vocal fry as "creaky voice" and have developed specific algorithms to help with that type of voice input specifically.

If you are looking for the pitch class you should have a look at the chromagram (http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/)

You can also simply dectect the f0 (using something like YIN algorithm) and return the appropriate semitone, most of fundamental frequency estimation algorithms suffer from octave error

Perform a Discrete Fourier Transform on samples from your input waveform, then sum values that correspond to equivalent notes in different octaves. Take the largest value as the dominant frequency.

You can likely find some existing DFT code in Objective C that suits your needs.

Putting up information as I find it...

Pitch detection algorithm on Wikipedia is a good place to start. It lists a few methods that fail for determining octave, which is okay for my purpose.

A good explanation of autocorrelation can be found here (why can't Wikipedia put things simply like that??).

Finally I have closure on this one, thanks to this article from DSP Dimension

The article contains source code.

Basically he performs an FFT. then he explains that frequencies that don't coincide spot on with the centre of the bin they fall in will smear over nearby bins in a sort of bell shaped curve. and he explains how to extract the exact frequency from this data in a second pass (FFT being the first pass).

the article then goes further to pitch shift; I can simply delete the code.

note that they supply a commercial library that does the same thing (and far more) only super optimised. there is a free version of the library that would probably do everything I need, although since I have worked through the iOS audio subsystem, I might as well just implement it myself.

for the record, I found an alternative way to extract the exact frequency by approximating a quadratic curve over the bin and its two neighbours here. I have no idea what is the relative accuracy between these two approaches.

As others have mentioned you should use a pitch detection algorithm. Since that ground is well-covered I will address a few particulars of your question. You said that you are looking for the pitch class of the note. However, the way to find this is to calculate the frequency of the note and then use a table to convert it to the pitch class, octave, and cents. I don't know of any way to obtain the pitch class without finding the fundamental frequency.

You will need a real-time pitch detection algorithm. In evaluating algorithms pay attention to the latency implied by each algorithm, compared with the accuracy you desire. Although some algorithms are better than others, fundamentally you must trade one for the other and cannot know both with certainty -- sort of like the Heisenberg uncertainty principle. (How can you know the note is C4 when only a fraction of a cycle has been heard?)

Your "smoothing" approach is equivalent to a digital filter, which will alter the frequency characteristics of the voice. In short, it may interfere with your attempts to estimate the pitch. If you have an interest in digital audio, digital filters are fundamental and useful tools in that field, and a fascinating subject besides. It helps to have a strong math background in understanding them, but you don't necessarily need that to get the basic idea.

Also, your zero crossing method is a basic technique to estimate the period of a waveform and thus the pitch. It can be done this way, but only with a lot of heuristics and fine-tuning. (Essentially, develop a number of "candidate" pitches and try to infer the dominant one. A lot of special cases will emerge that will confuse this. A quick one is the less 's'.) You'll find it much easier to begin with a frequency domain pitch detection algorithm.

if you re beginner this may be very helpful. It is available both on Java and IOS.

dywapitchtrack for ios

dywapitchtrack for java

继续阅读：algorithm detection pitch signal-processing

C/C++/Obj-C Real-time algorithm to ascertain Note (not Pitch) from Vocal Input

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？