开发者

How to extract chapter number in .doc file with text?

I use Apache POI HWPF to extract .doc file, I found that the extracted text has no Chapter number, Can POI extract the chapter number with the text?

public void readDocFile() {
    File docFile = null;
    WordExtractor docExtractor = null;
    WordExtractor exprExtractor = null;
    try {
        docFile = new File("C:\\Documents and Settings\\Administrator\\Desktop\\Topo6.doc");
        // A FileInputStream obtains input bytes from a file.
        FileInputStream fis = new FileInputStream(docFile.getAbsolutePath());

        // A HWPFDocument used to read document file from FileInputStream
        HWPFDocument doc = new HWPFDocument(fis);
        docExtractor = new Wor开发者_如何学CdExtractor(doc);
    } catch (Exception exep) {
        System.out.println(exep.getMessage());
    }

    // This Array stores each line from the document file.
    String text = docExtractor.getText();
    System.out.println(text);


}


Ok, I got it.

The chapter number in .doc file which is generated in office word is dynamic, so I must get the level of each paragraph, and calculate the chapter number myself.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜