Most simple and fast method for audio activity detection?

2023-01-05 21:21 问答作者：

Given is an array of 320 elements (int16), which represent an audio signal (16-bit LPCM) of 20 ms duration. I am looking for a most simple and very fast method which should decide whether this array contains active audio (like speech or music), but not noise or silence. I don't need a very high quality of the decision, but it must be very fast.

It occurred to me first to add all squares or absolute values of the elements and compare their sum with a threshold, but such a method is very slow on my system, even if it is O(开发者_JS百科n).

You're not going to get much faster than a sum-of-squares approach.

One optimization that you may not be doing so far is to use a running total. That is, in each time step, instead of summing the squares of the last n samples, keep a running total and update that with the square of the most recent sample. To avoid your running total from growing and growing over time, add an exponential decay. In pseudocode:

decay_constant=0.999;  // Some suitable value smaller than 1
total=0;
for t=1,...
    // Exponential decay
    total=total*decay_constant;

    // Add in latest sample
    total+=current_sample;

    if total>threshold
        // do something
    end
end

Of course, you'll have to tune the decay constant and threshold to suit your application. If this isn't fast enough to run in real time, you have a seriously underpowered DSP...

You might try calculating two simple "statistics" - first would be spread (max-min). Silence will have very low spread. Second would be variety - divide the range of possible values into say 16 brackets (= value range) and as you go through the elements, determine in which bracket that element goes. Noise will have similar numbers for all brackets, whereas music or speech should prefer some of them while neglecting others.

This should be possible to do in just one pass through the array and you do not need complicated arithmetics, just some addition and comparison of values.

Also consider some approximation, for example take only each fourth value, thus reducing the number of checked elements to 80. For audio signal, this should be okay.

I did something like this a while back. After some experimentation I arrived at a solution that worked sufficiently well in my case.

I used the rate of change in the cube of the running average over about 120ms. When there is silence (only noise that is) the expression should be hovering around zero. As soon as the rate starts increasing over a couple of runs, you probably have some action going on.


rate = cur_avg^3 - prev_avg^3

I used a cube because the square just wasn't agressive enough. If the cube is to slow for you, try using the square and a bitshift instead. Hope this helps.

It is clearly that the complexity should be at least O(n). Probably some simple algorithms that calculate some value range are good for the moment but I would look for Voice Activity Detection on web and for related code samples.

继续阅读：algorithm audio c embedded signal-processing

Most simple and fast method for audio activity detection?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？