How to search text in pdf document with quartz
I'm using quartz to display pdf. I need to get the indexes of pages where my searching text exists. Anyone can help me? Thanks.
Solution: There is a sample of code that extracts a text from the page and check it for the sequences.
#import <Foundation/Foundation.h>
@interface PDFSearcher : NSObject {
CGPDFOperatorTableRef table;
NSMutableString *currentData;
}
@property (nonatomic, retain) NSMutableString * currentData;
-(id)init;
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString;
@end
#import "PDFSearcher.h"
@implementation PDFSearcher
@synthesize currentData;
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
PDFSearcher * searcher = (PDFSearcher *)userInfo;
CGPDFArrayRef array;
bool success = CGPDFScannerPopArray(inScanner, &array);
for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 2)
{
if(n >= CGPDFArrayGetCount(array))
continue;
CGPDFStringRef string;
success = CGPDFArrayGetString(array, n, &string);
if(success)
{
NSString *data = (NSString *)CGPDFStringCopyTextString(string);
[searcher.currentData appendFormat:@"%@", data];
[data release];
}
}
}
void stringCallback(CGPDFScannerRef inScanner, void *userInfo)
{
PDFSearcher *searcher = (PDFSearcher *)userInfo;
CGPDFStringRef string;
bool success = CGPDFScannerPopString(inScanner, &string);
if(success)
{
NSString *data = (NSString *)CGPDFStringCopyTextString(string);
[searcher.currentData appendFormat:@"%@", data];
[data release];
}
}
-(id)init
{
if(self = [super init])
{
table = CGPDFOpe开发者_Go百科ratorTableCreate();
CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
}
return self;
}
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString
{
[self setCurrentData:[NSMutableString string]];
CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(inPage);
CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, self);
bool ret = CGPDFScannerScan(scanner);
CGPDFScannerRelease(scanner);
CGPDFContentStreamRelease(contentStream);
//NSLog(@"%u, %@", [self.currentData length], self.currentData);
return ([[self.currentData uppercaseString]
rangeOfString:[inSearchString uppercaseString]].location != NSNotFound);
}
@end
Use CGPDFDocument, CGPDFPage and CGPDFScanner to scan and parse the contents of the page into NSString. Then use NSString function to find the text on that page. If it exists store the corresponding pagenumber in some array. Repeat this scanning and parsing in for loop for number of pages in the pdf
http://www.random-ideas.net/posts/42%22
check out the above link its working.
There's nothing to do this inside of Quartz. Quartz is for graphics display - it doesn't need to know, or care about, searching a PDF for string matches. You will have to use the Core Graphics PDF parsing methods to pull out the data, search for the string yourself, and then get the page it occurs on.
If you use PDFDocument
, instead of CGPDFDocument
, that API has text search operations, such as findString:withOptions
精彩评论