Intelligent Voice Recording: Request for Ideas

2022-12-11 11:42 问答作者：

Say you have a conference room and meetings take place at arbitrary impromptu times. You would like to keep an audio record of all meetings. In order to make it as easy to use as possible, no action would be required on the part of meeting attenders, they just know that when they have a meeting in a specific room they will have a record of it.

Obviously just recording nonstop would be inefficient as it would be a waste of data storage and a pain to sift through.

I figure there are two basic ways to go about it.

Recording simply starts and stops accor开发者_运维百科ding to sound level thresholds.
Recording is continuous, but split into X minute blocks. Blocks found to contain no content are discarded.

I like the second way better because I feel there is less risk for losing data because of late starts, or triggers failing.

I would like to implement in Python, and on Windows if possible.

Implementation suggestions?

Bonus considerations that probably deserve their own questions:

best audio format and compression for this purpose
any way of determining how many speakers are present, assuming identification is unrealistic

This is one of those projects where the path is going to be defined more about what's on hand for ready reuse.

You'll probably find it easier to continuously record and saving the data off in chunks (for example, hour long pieces).

Format is going to be dependent on what you in the form of recording tools and audio processing library. You may even find that you use two. One format, like PCM encoded WAV for recording and processing, but compressed MP3 for storage.

Once you have an audio stream, you'll need to access it in a PCM form (list of amplitude values). A simple averaging approach will probably be good enough to detect when there is a conversation. Typical tuning attributes: * Average energy level to trigger * Amount of time you need to be at the energy level or below to identify stop and start (I recommend two different values) * Size of analysis window for averaging

As for number of participants, unless you find a library that does this, I don't see an easy solution. I've used speech recognition engines before and also done a reasonable amount of audio processing and I haven't seen any 'easy' ways to do this. If you were to look, search out universities doing speech analysis research. You may find some prototypes you can modify to give your software some clues.

I think you'll have difficulty doing this entirely in Python. You're talking about doing frequency/amplitude analysis of MP3 files. You would have to open up the file and look for a volume threshold, then cut out the portions that go below that threshold. Figuring out how many speakers are present would require very advanced signal processing.

A cursory Google search turned up nothing for me. You might have better luck looking for an off-the-shelf solution.

As an aside- there may be legal complications to having a recorder running 24/7 without letting people know.

继续阅读：voice voice-recording

Intelligent Voice Recording: Request for Ideas

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？