HTML to PDF conversion using iText
I'm trying to convert an HTMLl file to PDF. For that I'm using iText. If the HTML has some image in its body, iText fails to put that image in the PDF and it throws the following exception.
ExceptionConverter: java.io.FileNotFoundException: D:\cid:870001313@01022011-2B8B (The system cannot find the file specified).
If the HTML has some image in its body, is it possible to read that image and make it as attachment to that PDF file? Here is my source code (Truncate.java):
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Element;
import com.lowagie.text.Paragraph;
import com.lowagie.text.html.simpleparser.HTMLWorker;
import com.lowagie.text.pdf.PdfWriter;
public class Truncate {
public static void main(String[] args) throws DocumentException {
// TODO Auto-generated method stub
FileReader fr = null;
Document document = new Document();
document.open();
PdfWriter writer = null;
try {
String file_name = "C:\\Documentum\\Viewed\\911.htm";
fr = new FileReader(file_name);
PdfWriter.getInstance(document, System.out);
writer = PdfWriter.getInstance(document, new FileOutputStream(
"C:\\Documentum\\Viewed\\RH\\RH.pdf"));
document.add(new Paragraph("RH Mail"));
ArrayList h开发者_JAVA百科tmlContentList = HTMLWorker.parseToList(fr, null);
//fetch the html content line by line
for (int htmlDataCntr = 0; htmlDataCntr < htmlContentList.size(); htmlDataCntr++) {
Element htmlDataElement = (Element) htmlContentList
.get(htmlDataCntr);
document.add(htmlDataElement);
}
fr.close();
document.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
catch(Exception e){
System.out.println(e);
}
}
}
From the name of the attachment, it seems like your html is exported from an email. Make sure you should parse the email differently, and get the images apart from the other stuff.
EDIT : As I said, I think the problems lays upstream. The cid
notation corresponds to an embedded image in a mail (see here for example). So, if the upsteam mail parser doesn't give you the image file as an attachment, you can't do anything about it.
精彩评论