开发者

Can I digitalize a dictionary?

I've found a public domain latin<->portuguese dictionary in PDF which I'd like to convert to plain text, parse and use as the database of a program. After some testing, however, I got a little skeptical. Take a look at the original file and at the 开发者_开发问答resulting text of gocr. Is there any hope that I might reach 99%+ accuracy in some method? I thought of reCaptcha's database, but I guess it is Google's property, isn't it?

Thanks!


Another route is to use one of the freely available dictionary files, like http://www.brothersoft.com/downloads/dictionary-database.html


Or WordNet.

EDIT: I've just spotted that this is a Latin/Portuguese dictionary, so WordNet clearly is no good.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜