java提取pdf对应的目录及页码实践

2025-09-30 10:05 开发作者：万能的小帅

java提取pdf对应目录及页码

添加依赖

 <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.2tuBhiLur4</version>
        </dependency>

代码

public static void main(String[] args) throws IOException {
//        File file = new File("E:\\navicat.pdf");
        File file = new File("C:\\Users\\msi\\Desktop\\max.pdf");
        PDDocument documewww.devze.comnt = PDDocument.load(file);


        // 获取PDF文件的结构树根对象
//        PDStructureTreeRoot structureTree = document.getDocumentCatalog().getStructureTreeRoot();

        // 获取PDF文件的文档目录对象
        PDDocumentOutline documentOutline = document.getDocumentCatalog().getDocumentOutline();

        // 输出目录内容
        if (documentOutline != null) {

            printOutline(documentOutline, "",0);
        }


        // 创建PandroidDFTextStripper对象，用于提取文本
//        PDFTextStripper pdfStrippe编程客栈r = new PDFTextStripper();
//        // 提取文本
//        String text = pdfStripper.getText(document);
//        // 输出文本内容
//        System.out.println(text);

        // 关闭PDDocument对象
        document.close();

    }

    /python/ 递归输出目录内容
    private static void printOutline(PDOutlineNode documentOutline, String indent,int i) throws IOException {
        PDOutlineItem item = documentOutline.getFirstChild();
        i++;
        indent = indent + "    ";
        while (item != null){
            //        PDPageDestination destination = (PDPageDestination) item.getDestination();
//        int pageNumber = destination.retrievePageNumber();

            int pages = 0;
            if(item.getDestination() instanceof PDPageDestination){
                PDPageDestination pd = (PDPageDestination) item.getDestination();
                pages = pd.retrievePageNumber() + 1;
            }
            if (item.getAction()  instanceof PDActionGoTo) {
                PDActionGoTo gta = (PDActionGoTo) item.getAction();
                if (gta.getDestination() instanceof PDPageDestination) {
                    PDPageDestination pd = (PDPageDestination) gta.getDestination();
                    pages = pd.retrievePageNumber() + 1;
                }
            }
            System.out.println("------" +indent + item.getTitle() + "----"+pages+"   层级:"+i );
            // 递归处理子项
            printOutline(item, indent ,i);
//        获取同级
            item= item.getNextSibling();
        }

    }

总结

以上为个人经验，希望能给大家一个参考，也希望大家多多支持编程客栈(www.devze.com)。

继续阅读：java提取pdf java提取pdf目录 java提取pdf页码

java提取pdf对应的目录及页码实践

目录

java提取pdf对应目录及页码

添加依赖

代码

总结

更多精彩内容

精彩评论

最新开发

Go语言中uintptr和unsafe.Pointer的区别的实现小结

Go语言中栈扩容和栈缩容的使用

Go 语言中的命令行参数操作详解

浅谈Go 语言中逃逸分析是怎么进行的

Go语言错误和异常实现

开发排行榜

springboot后端存储富文本内容的思路与步骤(含图片内容)

PyCharm运行python测试,报错“没有发现测试”/“空套件”的解决

return base64.b64encode(b).decode(

基于C语言实现钻石棋游戏的示例代码

Sublime Text 3解决中文乱码问题（实测可用）