iTextSharp Pdf pages import memory issue
I am using this code to import different pdf files pages to a single document. When i import large files (200 pages or above) I am getting a OutOfMemory
exception. Am i doing something wrong here?
private bool SaveToFile(string fileName)
{
try
{
iTextSharp.text.Document doc;
iTextSharp.text.pdf.PdfCopy pdfCpy;
string output = fileName;
doc = new iTextSharp.text.Document();
pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(output, System.IO.FileMode.Create));
doc.Open();
foreach (DataGridViewRow item in dvSourcePreview.Rows)
{
string pdfFileName = item.Cells[COL_FILENAME].Value.ToString();
int pdfPageIndex = int.Parse(item.Cells[COL_PAGE_NO].Value.ToStri开发者_如何学JAVAng());
pdfPageIndex += 1;
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfFileName);
int pageCount = reader.NumberOfPages;
// set page size for the documents
doc.SetPageSize(reader.GetPageSizeWithRotation(1));
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, pdfPageIndex);
pdfCpy.AddPage(page);
reader.Close();
}
doc.Close();
return true;
}
catch (Exception ex)
{
return false;
}
}
You're creating a new PdfReader
for each pass. That's horribly inefficient. And because you've got a PdfImportedPage
from each one, all those (probably redundant) PdfReader
instances are never GC'ed.
Suggestions:
- Two passes. First build a list of files & pages. Second operate on each file in turn, so you only ever have one
PdfReader
"open" at a time. UsePdfCopy.freeReader()
when you're done with a given reader. This will almost certainly change the order in which your pages are added (maybe a Very Bad Thing). - One pass. Cache your
PdfReader
instances based on the file name. FreeReader again when you're done... but you probably won't be able to free any of them until you've dropped out of your loop. The caching alone may be enough to keep you from running out of memory. - Keep your code as is, but call
freeReader()
after you close a givenPdfReader
instance.
I haven't run into an OOM problems with iTextSharp. Are the PDFs created with iTextSharp or something else? Can you isolate the problem to a single PDF or a set of PDFs that might be corrupt? Below is sample code that creates 10 PDFs with 1,000 pages in each. Then it creates one more PDF and randomly pulls 1 page from those PDFs 500 times. On my machine it takes a little while to run but I don't see any memory issues or anything. (iText 5.1.1.0)
using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//Folder that we will be working in
string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Big File PDF Test");
//Base name of PDFs that we will be creating
string BigFileBase = Path.Combine(WorkingFolder, "BigFile");
//Final combined PDF name
string CombinedFile = Path.Combine(WorkingFolder, "Combined.pdf");
//Number of "large" files to create
int NumberOfBigFilesToMakes = 10;
//Number of pages to put in the files
int NumberOfPagesInBigFile = 1000;
//Number of pages to insert into combined file
int NumberOfPagesToInsertIntoCombinedFile = 500;
//Create our test directory
if (!Directory.Exists(WorkingFolder)) Directory.CreateDirectory(WorkingFolder);
//First step, create a bunch of files with a bunch of pages, hopefully code is self-explanatory
for (int FileCount = 1; FileCount <= NumberOfBigFilesToMakes; FileCount++)
{
using (FileStream FS = new FileStream(BigFileBase + FileCount + ".pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
{
Doc.Open();
for (int I = 1; I <= NumberOfPagesInBigFile; I++)
{
Doc.NewPage();
Doc.Add(new Paragraph("This is file " + FileCount));
Doc.Add(new Paragraph("This is page " + I));
}
Doc.Close();
}
}
}
}
//Second step, loop around pulling random pages from random files
//Create our output file
using (FileStream FS = new FileStream(CombinedFile, FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (Document Doc = new Document())
{
using (PdfCopy pdfCopy = new PdfCopy(Doc, FS))
{
Doc.Open();
//Setup some variables to use in the loop below
PdfReader reader = null;
PdfImportedPage page = null;
int RanFileNum = 0;
int RanPageNum = 0;
//Standard random number generator
Random R = new Random();
for (int I = 1; I <= NumberOfPagesToInsertIntoCombinedFile; I++)
{
//Just to output our current progress
Console.WriteLine(I);
//Get a random page and file. Remember iText pages are 1-based.
RanFileNum = R.Next(1, NumberOfBigFilesToMakes + 1);
RanPageNum = R.Next(1, NumberOfPagesInBigFile + 1);
//Open the random file
reader = new PdfReader(BigFileBase + RanFileNum + ".pdf");
//Set the current page
Doc.SetPageSize(reader.GetPageSizeWithRotation(1));
//Grab a random page
page = pdfCopy.GetImportedPage(reader, RanPageNum);
//Add it to the combined file
pdfCopy.AddPage(page);
//Clean up
reader.Close();
}
//Clean up
Doc.Close();
}
}
}
}
}
}
精彩评论