开发者

How to extract font styles of text contents using pdfbox?

I am using pdfbox开发者_C百科 library to extract text contents from pdf file.I would able to extract all the text,but couldn't find the method to extract font styles.


This is not the right way to extract font. To read font one has to iterate through pdf pages and extract font as below:

PDDocument  doc = PDDocument.load("C:/mydoc3.pdf");
List<PDPage> pages = doc.getDocumentCatalog().getAllPages();
for(PDPage page:pages){
    Map<String,PDFont> pageFonts=page.getResources().getFonts();
}


import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class pdf2box {
    public static void main(String args[])
    {
        try
        {
    PDDocument pddDocument=PDDocument.load("table2.pdf");
    PDFTextStripper textStripper=new PDFTextStripper();
    System.out.println(textStripper.getText(pddDocument));
    textStripper.getFonts();



    pddDocument.close();
        }
        catch(Exception ex)
        {
        ex.printStackTrace();
        }
    }


}


File file = new File("sample.pdf");
        PDDocument document = PDDocument.load(file);

        for (int i = 0; i < document.getNumberOfPages(); ++i)
        {
            PDPage page = document.getPage(i);
            PDResources res = page.getResources();
            for (COSName fontName : res.getFontNames())
            {
                PDFont font = res.getFont(fontName);
                System.out.println(font.getName());

            }
        }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜