iTextSharp Pdf pages import memory issue

2023-03-15 13:07 问答作者：

I am using this code to import different pdf files pages to a single document. When i import large files (200 pages or above) I am getting a OutOfMemory exception. Am i doing something wrong here?

    private bool SaveToFile(string fileName)
    {
        try
        {
            iTextSharp.text.Document doc;
            iTextSharp.text.pdf.PdfCopy pdfCpy;
            string output = fileName;

            doc = new iTextSharp.text.Document();
            pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(output, System.IO.FileMode.Create));
            doc.Open();

            foreach (DataGridViewRow item in dvSourcePreview.Rows)
            {
                string pdfFileName = item.Cells[COL_FILENAME].Value.ToString();
                int pdfPageIndex = int.Parse(item.Cells[COL_PAGE_NO].Value.ToStri开发者_如何学JAVAng());
                pdfPageIndex += 1;

                iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfFileName);
                int pageCount = reader.NumberOfPages;

                // set page size for the documents
                doc.SetPageSize(reader.GetPageSizeWithRotation(1));

                iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, pdfPageIndex);
                pdfCpy.AddPage(page);

                reader.Close();
            }

            doc.Close();

            return true;
        }
        catch (Exception ex)
        {
            return false;
        }
    }

You're creating a new PdfReader for each pass. That's horribly inefficient. And because you've got a PdfImportedPage from each one, all those (probably redundant) PdfReader instances are never GC'ed.

Suggestions:

Two passes. First build a list of files & pages. Second operate on each file in turn, so you only ever have one PdfReader "open" at a time. Use PdfCopy.freeReader() when you're done with a given reader. This will almost certainly change the order in which your pages are added (maybe a Very Bad Thing).
One pass. Cache your PdfReader instances based on the file name. FreeReader again when you're done... but you probably won't be able to free any of them until you've dropped out of your loop. The caching alone may be enough to keep you from running out of memory.
Keep your code as is, but call freeReader() after you close a given PdfReader instance.

I haven't run into an OOM problems with iTextSharp. Are the PDFs created with iTextSharp or something else? Can you isolate the problem to a single PDF or a set of PDFs that might be corrupt? Below is sample code that creates 10 PDFs with 1,000 pages in each. Then it creates one more PDF and randomly pulls 1 page from those PDFs 500 times. On my machine it takes a little while to run but I don't see any memory issues or anything. (iText 5.1.1.0)

using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //Folder that we will be working in

            string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Big File PDF Test");

            //Base name of PDFs that we will be creating
            string BigFileBase = Path.Combine(WorkingFolder, "BigFile");

            //Final combined PDF name
            string CombinedFile = Path.Combine(WorkingFolder, "Combined.pdf");

            //Number of "large" files to create
            int NumberOfBigFilesToMakes = 10;

            //Number of pages to put in the files
            int NumberOfPagesInBigFile = 1000;

            //Number of pages to insert into combined file
            int NumberOfPagesToInsertIntoCombinedFile = 500;

            //Create our test directory
            if (!Directory.Exists(WorkingFolder)) Directory.CreateDirectory(WorkingFolder);

            //First step, create a bunch of files with a bunch of pages, hopefully code is self-explanatory
            for (int FileCount = 1; FileCount <= NumberOfBigFilesToMakes; FileCount++)
            {
                using (FileStream FS = new FileStream(BigFileBase + FileCount + ".pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
                {
                    using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
                    {
                        using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
                        {
                            Doc.Open();
                            for (int I = 1; I <= NumberOfPagesInBigFile; I++)
                            {
                                Doc.NewPage();
                                Doc.Add(new Paragraph("This is file " + FileCount));
                                Doc.Add(new Paragraph("This is page " + I));
                            }
                            Doc.Close();
                        }
                    }
                }
            }

            //Second step, loop around pulling random pages from random files

            //Create our output file
            using (FileStream FS = new FileStream(CombinedFile, FileMode.Create, FileAccess.Write, FileShare.Read))
            {
                using (Document Doc = new Document())
                {
                    using (PdfCopy pdfCopy = new PdfCopy(Doc, FS))
                    {
                        Doc.Open();

                        //Setup some variables to use in the loop below
                        PdfReader reader = null;
                        PdfImportedPage page = null;
                        int RanFileNum = 0;
                        int RanPageNum = 0;

                        //Standard random number generator
                        Random R = new Random();

                        for (int I = 1; I <= NumberOfPagesToInsertIntoCombinedFile; I++)
                        {
                            //Just to output our current progress
                            Console.WriteLine(I);

                            //Get a random page and file. Remember iText pages are 1-based.
                            RanFileNum = R.Next(1, NumberOfBigFilesToMakes + 1);
                            RanPageNum = R.Next(1, NumberOfPagesInBigFile + 1);

                            //Open the random file
                            reader = new PdfReader(BigFileBase + RanFileNum + ".pdf");
                            //Set the current page
                            Doc.SetPageSize(reader.GetPageSizeWithRotation(1));

                            //Grab a random page
                            page = pdfCopy.GetImportedPage(reader, RanPageNum);
                            //Add it to the combined file
                            pdfCopy.AddPage(page);

                            //Clean up
                            reader.Close();
                        }

                        //Clean up
                        Doc.Close();
                    }
                }
            }

        }
    }
}

继续阅读：itext pdf

iTextSharp Pdf pages import memory issue

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？