PDF Text search C# [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this questionI have requirement to read a pdf file and search for a text. I should display in which page that text exist and the number of occurances. I can read the pdf to text but i need to know the page number.
Thanks
You can use Docotic.Pdf for this (I work for Bit Miracle).
Here is a sample for how to search text in PDF:
PdfDocument doc = new PdfDocument("file.pdf");
string textToSearch = "some text";
for (int i = 0; i < doc.Pages.Count; i++)
{
string pageText = doc.Pages[i].GetText();
int count = 0;
int lastStartIndex = pageText.IndexOf(textToSearch, 0, StringComparison.CurrentCultureIgnoreCase);
while (lastStartIndex != -1)
{
count++;
lastStartIndex = pageText.IndexOf(textToSearch, lastStartIndex + 1, StringComparison.CurrentCultureIgnoreCase);
}
if (count != 0)
Console.WriteLine("Page {0}: '{1}' found {2} times", i, textToSearch, count);
}
You may want to remove third argument for IndexOf method if you want to perform case-sensitive search.
Have you checked itextsharp out? http://itextsharp.sourceforge.net/
EDIT: To elaborate, in the TOC, i saw a section on: 15.3.3: Extracting text with PdfReaderContentParser and PdfTextExtractor
And under PdfReaderContentParser: http://api.itextpdf.com/com/itextpdf/text/pdf/parser/PdfReaderContentParser.html there is an option to process the pdf content per page.
So it seems to be a round about way, but you can iterate through each page, searching the content for the word that you want and then return the page that you found it under.
加载中,请稍侯......
精彩评论