Building openears compatible language model

2023-02-15 02:28 问答作者：

I am doing some development on speech to text and text to speech and I found the OpenEars API very useful.

The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. So I decided to find a big English language model to feed the API speech recognizer engine. But 开发者_如何学编程I failed to understand the format of the voxfourge english data model to use with OpenEars.

Do anyone have any idea that how can I get the .languagemodel and .dic file for English language to work with OpenEars?

Regarding LM Formats:

AFAIK most Language Models use the ARPA standard for Language Models. Sphinx / CMU language models are compiled into binary format. You'd need the source format to convert a Sphinx LM into another format. Most other Language Models are in text format.

I'd recommend using the HTK Speech Recognition Toolkit ; Detailed Documentation here: http://htk.eng.cam.ac.uk/ftp/software/htkbook_html.tar.gz

Here's also a description of CMU's SLM Toolkit: http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html

Here's an example of a language model in ARPA format I found on the net: http://www.arborius.net/~jphekman/sphinx/full/index.html

You probably want to create an ARPA LM first, then convert it into any binary format if needed.

In General:

To build a language model, you need lots and lots of training data - to determine what the probability of any other word in your vocabulary is, after observing the current input to this point in time.

You can't just "make" a language model by just adding the words you want to recognize - you also need a lot of training data (= typical input you observe when running your speech recognition application).

A Language Model is not just a word list -- it estimates the probability of the next token (word) in the input. To estimate those probabilities, you need to run a training process, which goes over training data (e.g. historic data), and observes word frequencies there to estimate above mentioned probabilities.

For your problem, maybe as a quick solution, just assume all words have the same frequency / probability.

create a dictionary with the words you want to recognize (N words in dictionary)
create a language model which has 1/N as the probability for each word (uni-gram language model)

you can then interpolate that uni-gram language model (LM) with another LM for a bigger corpus using HTK Toolkit

Old question, but maybe the answer is still interesting. OpenEars now has built-in language model generation, so one option is for you to create models dynamically in your app as you need them using the LanguageModelGenerator class, which uses the MITLM library and NSScanner to accomplish the same task as the CMU toolkit mentioned above. Processing a corpus with >5000 words on the iPhone is going to take a very long time, but you could always use the Simulator to run it once and get the output out of the documents folder and keep it.

Another option for large vocabulary recognition is explained here:

Creating ARPA language model file with 50,000 words

Having said that, I need to point out as the OpenEars developer that the CMU tool's limit of 5000 words corresponds pretty closely to the maximum vocabulary size that is likely to have decent accuracy and processing speed on the iPhone when using Pocketsphinx. So, the last suggestion would be to either reconceptualize your task so that it doesn't absolutely require large vocabulary recognition (for instance, since OpenEars allows you switch models on the fly, you may find that you don't need one enormous model but can get by with multiple smaller ones that you can switch in in different contexts), or to use a network-based API that can do large vocabulary recognition on a server (or make your own API that uses Sphinx4 on your own server). Good luck!

继续阅读：language-model speech-recognition

Building openears compatible language model

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？