regex issue while parsing .pdf file using CAM::PDF
Unmatched [ in regex; marked by <-- HERE in m/ <-- HERE / at ./pdf_parse.pl line 37.
Actually I'm parsing .pdf file word by word [in order to make a dictionary out of it] line 37:-
if(grep(!/$w开发者_如何学Cord/,@line_rd)){
}
Well actual word where parser script stops working is in different font [in side the pdf which I'm parsing], is that the culprit here ?
Whether CAM::PDF recognizes words in different fonts ? What care should i do, in order to stop this !You need to quote $word
in the regular expression if it can contain special chars (like [
or even .
). Try with:
if (grep(!/\Q$word\E/, @line_rd)) {
...
}
If you want to make a dictionary of all the words, use a hash:
my %allwords;
...
# each time you have a new word incoming from the parser:
$allwords{$word}++;
At the end, the %allwords
hash will contain the distinct words as keys, and the word count as values. You could e.g. print it using:
map {
print "Word $_: count: ", $allwords{$_}, "\n";
} (sort keys %allwords);
精彩评论