Given MP3, is it possible to break out different instruments using Fast Fourier transform (FFT)?

2023-03-27 05:25 问答作者：

I am working on a music visualizer and I'd like to display a different visual element for each instrument. For example, blue bar representing vocal, red bar representing guitar, yellow bar representing drums, etc.

Is there a way to analyze the results of FFT to get this information?

Tha开发者_JS百科nks.

This is a challenge that's an active area of research in music technology.

It's possible, to an extent, but it's certainly not easy. It will be especially difficult using mp3 as a lot of important information is lost in compression.

What you're trying to do is known as Audio Source Separation, or Sound Source Separation. It pursues the separation of an audio recording into its constituent elements.

These elements could be speech (several people talking at the same time - the 'cocktail party problem') or instruments (separating one instrument from another in a recording 'blind demixing').

There's various approaches you could take, some of these are based on the frequency domain characteristics of sound and others are based on spatial properties.

The frequency domain approach might appear fairly straightforward if you're trying to separate a bass drum and a flute (i.e. the low frequency bins of your FFT would be the bass drum and the higher frequency bins assigned to the flute) however in reality sounds are rarely neatly segregated into useful frequency regions. The bass drum for example will have harmonic content right the way up the frequency spectrum. These types of solutions are hence very mathematically complicated and often involves statistical modeling. Heavy stuff.

Separation based on spatial properties of sound often relies on some prior knowledge of where each source was before recording (this is 'non-blind'). It's often necessary to have more than one microphone (stereo recording at least). Using some clever maths it's possible to approach separating the sources based on a knowledge of where the source is in space, based on the relationship of the signals at each microphone. This is also the basis for a technique called beamforming, by which the position of a source can be determined using an array of microphones.

So, back on track. People are trying to do it, but it's complicated, and using mp3 will make your life difficult!

I'm afraid I don't really know enough to explain the approaches better, but I can find a few references to get you started:

http://www.cs.tut.fi/~tuomasv/demopage.html

http://www.cs.northwestern.edu/~pardo/courses/eecs352/lectures/source%20separation.pdf (pdf warning!)

Good luck!

For the vocal and bass you can use the fact that they are usually in the center of the stereo mix, which means it will have the exact same waveform in the left and right channel. If you subtract one channel from the other you will end up with a new channel that often will be without vocal and bass.

Something like:

sound = LoadMP3(...)
length = sound.SampleCount
left = sound.Channels[LEFT]
right = sound.Channels[RIGHT]
for i = 0:length
    difference[i] = left[i] - right[i]

Now you can look at clever ways to visualize FFT(left), FFT(right) and FFT(difference).

Maybe this will take a small step towards the effect that you are after?

继续阅读：audio fft signal-processing

Given MP3, is it possible to break out different instruments using Fast Fourier transform (FFT)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？