Compensating for channel effects
I am trying to work on a system where the quality of a recorded sentence is rated by a computer. There are three modes under which this system operates:
- When the person records a sentence using a mic and mixer arrangement.
- When the user records over a landline.
- When the user records over a mobile phone.
I notice that the scores I get from recordings using the above 3 sources are in the following order: Mic_score > Landline_score > mobile_score
It is likely that the above order is because of the effects of the codecs and channel characteristics. 开发者_JAVA技巧My question is:
- What can be done to compensate for channel/codec introduced artifacts to get consistent scores across channels? If some sort of inverse filtering, then please provide some links where I could get started.
- How do I detect what channel the input speech has been recorded on? Use HMMs?
Edit 1
: I am not at liberty to go into the details of the criteria. The current scores that I get from the mic, landline and mobile (for the same sentence said (and similarly spoken over the three mediums) is something like 80, 66, 41. This difference may be because of the channel effects. If the content and manner of speaking the sentence is the same, then I am looking for an algorithm that normalizes
the scores (they need not be the same, but they should be close).
It may very well be that the sound quality is different. Have you tried listening to some examples?
You can also use any spectrum analyzer to look at that data in detail. I suggest http://www.baudline.com/. Things your should look out for: Distance between the noise floor and the speech.
Also look at the high frequency noise bursts when the letters t, f and s are spoken. In low quality lines the difference between these letters disappears.
Why do you want to skew the quality measures? Giving an objective response of the quality seems to make more sense.
The landline codec will remove all frequencies around and above 4 kHz. The cell phone codec will throw away more information as part of a lossy compression process. Unless you have another side channel of information regarding the original audio content, there is no reliable way to recover the audio that was thrown away.
You best bet to normalize is to low pass filter the audio to match the 8 kHz telco codec, and the run the result through some cellular standard compression algorithm (there may be one published for your particular mobile cellular protocol). This should reduce the quality of all 3 signals to about the same.
精彩评论