Java:using apache POI how to convert ms word file to pdf?

2023-03-09 22:41 问答作者：

By using apache POI how to convert ms word file to pdf?

I an using the following code but its not working giving errors I guess I am importing the wrong classes?

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.apache.poi.hslf.record.Document;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
         Document document = new Document(); 

         try {  
             System.out.println("Starting the test");  
             fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));  

             HWPFDocument doc = new HWPFDocument(fs);  
             WordExtractor we = new WordExtractor(doc);  

             OutputStream file开发者_JS百科 = new FileOutputStream(new File("/document/test.pdf")); 

             PdfWriter writer = PdfWriter.getInstance(document, file);  

             Range range = doc.getRange();
             document.open();  
             writer.setPageEmpty(true);  
             document.newPage();  
             writer.setPageEmpty(true);  

             String[] paragraphs = we.getParagraphText();  
             for (int i = 0; i < paragraphs.length; i++) {  

                 org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
                // CharacterRun run = pr.getCharacterRun(i);
                // run.setBold(true);
                // run.setCapitalized(true);
                // run.setItalic(true);
                 paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");  
             System.out.println("Length:" + paragraphs[i].length());  
             System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());  

             // add the paragraph to the document  
             document.add(new Paragraph(paragraphs[i]));  
             }  

             System.out.println("Document testing completed");  
         } catch (Exception e) {  
             System.out.println("Exception during test");  
             e.printStackTrace();  
         } finally {  
                         // close the document  
            document.close();  
                     }  
         }  
    }

Got It solved

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;


import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
        Document document = new Document();

         try {  
             System.out.println("Starting the test");  
             fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc"));  

             HWPFDocument doc = new HWPFDocument(fs);  
             WordExtractor we = new WordExtractor(doc);  

             OutputStream file = new FileOutputStream(new File("D:/test.pdf")); 

             PdfWriter writer = PdfWriter.getInstance(document, file);  

             Range range = doc.getRange();
             document.open();  
             writer.setPageEmpty(true);  
             document.newPage();  
             writer.setPageEmpty(true);  

             String[] paragraphs = we.getParagraphText();  
             for (int i = 0; i < paragraphs.length; i++) {  

                 org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
                // CharacterRun run = pr.getCharacterRun(i);
                // run.setBold(true);
                // run.setCapitalized(true);
                // run.setItalic(true);
                 paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");  
             System.out.println("Length:" + paragraphs[i].length());  
             System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());  

             // add the paragraph to the document  
             document.add(new Paragraph(paragraphs[i]));  
             }  

             System.out.println("Document testing completed");  
         } catch (Exception e) {  
             System.out.println("Exception during test");  
             e.printStackTrace();  
         } finally {  
                         // close the document  
            document.close();  
                     }  
         }  
    }

This worked For Me:-

Source :- http://www.programcreek.com/java-api-examples/index.php?api=org.apache.poi.xwpf.converter.pdf.PdfConverter

package pdf;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class PDF {
    public static void main(String[] args) throws Exception {
          String inputFile="D:/TEST.docx";
          String outputFile="D:/TEST.pdf";
          if (args != null && args.length == 2) {
            inputFile=args[0];
            outputFile=args[1];
          }
          System.out.println("inputFile:" + inputFile + ",outputFile:"+ outputFile);
          FileInputStream in=new FileInputStream(inputFile);
          XWPFDocument document=new XWPFDocument(in);
          File outFile=new File(outputFile);
          OutputStream out=new FileOutputStream(outFile);
          PdfOptions options=null;
          PdfConverter.getInstance().convert(document,out,options);
        }
}

The below code worked for me:

Public class DocToPdfConverter{

public static void main(String[] args) {

        String k=null;
        OutputStream fileForPdf =null;
        try {

            String fileName="/document/test2.doc";
            //Below Code is for .doc file 
            if(fileName.endsWith(".doc"))
            {
            HWPFDocument doc = new HWPFDocument(new FileInputStream(
                    fileName));
            WordExtractor we=new WordExtractor(doc);
            k = we.getText();

             fileForPdf = new FileOutputStream(new File(
                        "/document/DocToPdf.pdf")); 
            we.close();
            }

            //Below Code for 

            else if(fileName.endsWith(".docx"))
            {
                XWPFDocument docx = new XWPFDocument(new FileInputStream(
                        fileName));
                // using XWPFWordExtractor Class
                XWPFWordExtractor we = new XWPFWordExtractor(docx);
                 k = we.getText();

                 fileForPdf = new FileOutputStream(new File(
                            "/document/DocxToPdf.pdf"));    
                 we.close();
            }



            Document document = new Document();
            PdfWriter.getInstance(document, fileForPdf);

            document.open();

            document.add(new Paragraph(k));

            document.close();
            fileForPdf.close();



        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

There are several steps here:

Read Word document using POI into a format-agnostic form
Convert format-agnostic form into PDF
Write PDF

I don't know if POI will do step 2 for you. I'd recommend something else, like iText.

As a side note, it's also possible to read content on-the-fly directly from a Word/Excel content stream instead of reading it from the filesystem and serializing it to disk, for example when retrieving content from CMIS repositories:

e.g.

 //HWPFDocument docx = new HWPFDocument(fs);  
 HWPFDocument docx = new HWPFDocument(doc.getContentStream().getStream());

(doc is of type org.apache.chemistry.opencmis.client.api.Document and in this case I adapted your code to retrieve a word file from an Alfresco repository by means of opencmis and transformed it to PDF)

HTH

In addition to Kushagra's answer, here the updated maven dependencies:

    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
        <version>2.0.1</version>
    </dependency>
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.xdocreport.converter</artifactId>
        <version>2.0.1</version>
    </dependency>
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
        <version>2.0.1</version>
    </dependency>
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
        <version>2.0.1</version>
    </dependency>

This save my day, i load docx file from an url and convert it to pdf:

pom.xml

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.13</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.13</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
    <version>LATEST</version>
</dependency>

main_class

public String wordToPDFPOI(String url) throws Exception {
    InputStream doc = new URL(url).openStream();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    XWPFDocument document = new XWPFDocument(doc);
    PdfOptions options = PdfOptions.create();
    PdfConverter.getInstance().convert(document, baos, options);
    String base64_encoded = Base64.encodeBytes(baos.toByteArray());

    return base64_encoded;
}

All of the answers above will fail if the document has images. I would not suggest you to use apache poi since its library to convert word to pdf have been discontinued now. As of today I don't think that there is any open source library which do the conversion (they require some dependencies like some need MS word to be installed, etc). The best way I could think of (it will only work if you are deploying project on linux machine) is that install Libre Office (open source) in the linux machine and run this :

  String command = "libreoffice --headless --convert-to pdf " + inputPath + " --outdir " + outputPath;
 
 try {
         Runtime.getRuntime().exec(command);
      } catch (IOException e) {
         e.printStackTrace();
      }

继续阅读：apache-poi itext

Java:using apache POI how to convert ms word file to pdf?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？