Tesseract-Job: how to parse an image in order to get the information out of it

2023-03-22 11:51 问答作者：

good moring.

first of all. This is the most impressive community i ever saw!

Well several days i mused about the three-folded job of

a. getting b. parsing c. storing a number of pages.

Two days ago i thought that getting the pages would be the major-task. No this isnt the case - i guess that the parser-job would be a heroic task. Each of the pages that are intended to be parsed is a png-image.

So the question is - after getting all them. How to parse them!? This seems to be the issue. Guess that there are some perl-modules out there - that can help in doing this...

Well - i think that this job only can be done with some OCR embedded! Question: is there a perl-module that can be use here to support this task:

BTW: see the result-pages.

Tesseract-Job: how to parse an image in order to get the information out of it

BTW;: and as i thought i can find all 790 resultpages within a certain range between Id= 0 and Id= 100000 i thought, that i can go the way with a loop:

http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html http://www.founda开发者_如何学运维tionfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html

i thought i can go the Perl-Way but i am not very very sure: I was trying to use LWP::UserAgent on the same URLs [see below] with different query arguments, and i am wondering if LWP::UserAgent provides a way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?

But - to be frank; The first task " GETTING all the pages is not very difficult - if we compare this task with the parsing... How can this be done!?

Any ideas - suggestions -

look forward to hear from you...

zero

You do not need a Perl module, you only need the system function.

system qw[ tesseract.exe foo.png foo.txt ];
my $text = read_file('foo.txt');

You may need to preprocess the images to help Tesseract, say using ImageMagick like:

system qw[ convert.exe -resize 200%   image.jpg foo.png ];

继续阅读：ocr parsing perl tesseract

Tesseract-Job: how to parse an image in order to get the information out of it

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？