开发者

Intelligent Voice Recording: Request for Ideas

Say you have a conference room and meetings take place at arbitrary impromptu times. You would like to keep an audio record of all meetings. In order to make it as easy to use as possible, no action would be required on the part of meeting attenders, they just know that when they have a meeting in a specific room they will have a record of it.

Obviously just recording nonstop would be inefficient as it would be a waste of data storage and a pain to sift through.

I figure there are two basic ways to go about it.

  1. Recording simply starts and stops accor开发者_运维百科ding to sound level thresholds.
  2. Recording is continuous, but split into X minute blocks. Blocks found to contain no content are discarded.

I like the second way better because I feel there is less risk for losing data because of late starts, or triggers failing.

I would like to implement in Python, and on Windows if possible.

Implementation suggestions?

Bonus considerations that probably deserve their own questions:

  • best audio format and compression for this purpose
  • any way of determining how many speakers are present, assuming identification is unrealistic


This is one of those projects where the path is going to be defined more about what's on hand for ready reuse.

You'll probably find it easier to continuously record and saving the data off in chunks (for example, hour long pieces).

Format is going to be dependent on what you in the form of recording tools and audio processing library. You may even find that you use two. One format, like PCM encoded WAV for recording and processing, but compressed MP3 for storage.

Once you have an audio stream, you'll need to access it in a PCM form (list of amplitude values). A simple averaging approach will probably be good enough to detect when there is a conversation. Typical tuning attributes: * Average energy level to trigger * Amount of time you need to be at the energy level or below to identify stop and start (I recommend two different values) * Size of analysis window for averaging

As for number of participants, unless you find a library that does this, I don't see an easy solution. I've used speech recognition engines before and also done a reasonable amount of audio processing and I haven't seen any 'easy' ways to do this. If you were to look, search out universities doing speech analysis research. You may find some prototypes you can modify to give your software some clues.


I think you'll have difficulty doing this entirely in Python. You're talking about doing frequency/amplitude analysis of MP3 files. You would have to open up the file and look for a volume threshold, then cut out the portions that go below that threshold. Figuring out how many speakers are present would require very advanced signal processing.

A cursory Google search turned up nothing for me. You might have better luck looking for an off-the-shelf solution.

As an aside- there may be legal complications to having a recorder running 24/7 without letting people know.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜