How to extract chapter number in .doc file with text?
I use Apache POI HWPF to extract .doc file, I found that the extracted text has no Chapter number, Can POI extract the chapter number with the text?
public void readDocFile() {
File docFile = null;
WordExtractor docExtractor = null;
WordExtractor exprExtractor = null;
try {
docFile = new File("C:\\Documents and Settings\\Administrator\\Desktop\\Topo6.doc");
// A FileInputStream obtains input bytes from a file.
FileInputStream fis = new FileInputStream(docFile.getAbsolutePath());
// A HWPFDocument used to read document file from FileInputStream
HWPFDocument doc = new HWPFDocument(fis);
docExtractor = new Wor开发者_如何学CdExtractor(doc);
} catch (Exception exep) {
System.out.println(exep.getMessage());
}
// This Array stores each line from the document file.
String text = docExtractor.getText();
System.out.println(text);
}
Ok, I got it.
The chapter number in .doc file which is generated in office word is dynamic, so I must get the level of each paragraph, and calculate the chapter number myself.
精彩评论