Can I digitalize a dictionary?
I've found a public domain latin<->portuguese dictionary in PDF which I'd like to convert to plain text, parse and use as the database of a program. After some testing, however, I got a little skeptical. Take a look at the original file and at the 开发者_开发问答resulting text of gocr. Is there any hope that I might reach 99%+ accuracy in some method? I thought of reCaptcha's database, but I guess it is Google's property, isn't it?
Thanks!
Another route is to use one of the freely available dictionary files, like http://www.brothersoft.com/downloads/dictionary-database.html
Or WordNet.
EDIT: I've just spotted that this is a Latin/Portuguese dictionary, so WordNet clearly is no good.
精彩评论