How to extract paragraphs instead of whole texts only for XWPFWordExtractor (POI Library) Java
I开发者_JS百科 know the following code could extract whole texts of the docx document, however, I need to extract paragraph instead. Is there are possible way??
public static String extractText(InputStream in) throws Exception {
JOptionPane.showMessageDialog(null, "Start extracting docx");
XWPFDocument doc = new XWPFDocument(in);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();
return text;
}
Any helps would much appreciated. I need this so urgently.
That's just a guess after brief looking at the API:
doc.getParagraphs()
Link to the API: http://poi.apache.org/apidocs/org/apache/poi/xwpf/usermodel/XWPFDocument.html#getParagraphs()
I wrote utility method for this as below:
public static List<String> getParagraphs(File file)
{
List<String> paragraphs = new ArrayList<>();
try
{
FileInputStream fis = new FileInputStream(file);
XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
List<XWPFParagraph> paragraphList = xdoc.getParagraphs();
for (XWPFParagraph paragraph : paragraphList)
{
paragraphs.add(paragraph.getText());
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
return paragraphs;
}
Though, the question is very old. I am answering in the hope to help if somebody's browser ended here in the quest of answer.
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for(XWPFParagraph paragraph: paragraphs){
System.out.println("Text in this paragraph: " + paragraph.getText());
}
System.out.println("Total no of paragraph in Docx : "+paragraphs.size());
Hope this helps!
精彩评论