text layout recognition with python

2023-03-19 06:54 问答作者：

I'm trying to sort through several thousand scanned files and sort them into folders based on type (ie: if one of the 开发者_开发技巧files is a scanned copy of formA, then it should go in the formA folder, if it's a scanned copy of formB, then it should go in the formB folder, etc...). I feel like the best way to match the files and types is based on their text outlines, but am totally new to image processing, so if there's a better solution, then I'm all ears.

I'm working in python. Any ideas of a best way to do this? PIL? OpenCV? imageMagick?

Thanks in advance...

This library is probably of interest to you -
http://code.google.com/p/ocropus/
Its made by googlers and lets you do OCR and layout analysis from python.
I had some trouble installing it, but that was quite a while back, so things may have gotten fixed by now.

I don't know in what format you've got the scanned documents, but pdfminer can do layout analysis for pdf. I guess it would fit the bill for your purpose, provided you get the documents in somewhat decent pdf format (if you've just got "pure images", it won't do you any good)

继续阅读：document-layout-analysis image-processing ocr python

text layout recognition with python

更多精彩内容

精彩评论

最新问答

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

哪里医院专治输卵管堵塞好？

外语基础薄弱的人出国自由行，带哪种翻译器比较好？？

输卵管积液手术价格？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？