Error while reading text out of a pdf using perl api pdf::api2
This is the code to read text of a pdf using perl
#!/usr/bin/perl
use PDF::API2;
$pdf = PDF::API2->new;
$pdf = PDF::API2->open('01443325.pdf');
$page = $pdf->page;
$pagenum=10;
$pdf->stringify;
$page = $pdf->openpage($pagenum);
print $page;
I dont get any output when i Run 开发者_JAVA百科this code . How to remove the error ?
When you run $pdf->stringify above, it returns the content of the file as a string, but then you don't do anything with it. If you were to print it, though, it would not give you the text representation you are after as it is simply the original PDF bytes in a string.
Likewise, setting $pagenum to 10 has no consequences for the rest of the program as the variable is not linked to either the $pdf or $page object in any way.
I think the easiest option is to not try to do this with PDF::API2, but to look at whether you can run something like pdftotext from xpdf or poppler first and then read in the output.
If not, then there are some suggestions on the Perl Monks page http://www.perlmonks.org/?node_id=810721, and many more on Google under "perl extract text from pdf". There's even a previous SO question at How can I extract text from a PDF file in Perl?.
Good luck!
精彩评论