开发者

POI performance

I am using POI in my J2EE web application to generate a workbook. However, i find that POI takes around 3 mins 开发者_如何转开发to create a workbook with 25K rows(with around 15 columns each). Is this a POI performance issue , or is it justified to take that much of time? Are there other APIs known for better performance ?


The performance of writing large files with POI can be heavily reduced if you used the 'streaming' POI API instead of the standard one. Indeed by default POI will keep all your data in memory before writing all in one go at the end. The memory footprint of this can be ridiculously large for big files. Instead using the streaming API you can control how memory is used and data is written to disk progressively.

In order to create a streaming workbook, use something like :

  SXSSFWorkbook book = new SXSSFWorkbook(); 
  book.setCompressTempFiles(true);

  SXSSFSheet sheet = (SXSSFSheet) book.createSheet();
  sheet.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
  // ...


I would be very surprised to see POI take that much time to generate such a file. I just generated a sheet with 30000 rows x 10 cells in about 18s (no formatting, to be fair). The cause might be one of the following:

  • POI logging might be turned on, as described here
  • you are running from swap memory
  • your VM available heap might be very low


If none of the other answers work out, see if Andy Khan's JExcel will be better. I've found it to be far superior to POI for dealing with Excel in Java.


We also use POI in our web app and do not have any performance issue with it - although our generated documents are far smaller than yours. I would first check if POI is the real issue here. Try to generate those documents without the J2EE-overhead (Unit-Test) and measure the performance. You could also monitor the load and memory usage on your J2EE server to see if the problems come from some suboptimal system settings.


I've compared Apache POI with JExcel library. It seems that JExcel is about up to 4x faster than Apache POI but memory consumption seems to be more or less the same:

@Test
public void createJExcelWorkbook() throws Exception {
        WritableWorkbook workbook = Workbook.createWorkbook(new File("jexcel_workbook.xls"));
        WritableSheet sheet = workbook.createSheet("sheet", 0); 
        for ( int i=0; i < 65535; i++) {
            for ( int j=0; j < 10; j++) {
                Label label = new Label(j, i, "some text " + i + " " + j);
                sheet.addCell(label);
            }
        }
        workbook.write();
        workbook.close();
}

@Test
public void createPoiWorkbook() throws Exception {
    Workbook wb = new HSSFWorkbook();
    Sheet sheet = wb.createSheet("sheet");  
    for ( int i=0; i < 65535; i++) {
        Row row = sheet.createRow(i);
        for ( int j=0; j < 10; j++) {
            Cell cell = row.createCell(j);
             cell.setCellValue("some text " + i + " " + j);
        }
    }   
    FileOutputStream fileOut = new FileOutputStream("poi_workbook.xls");
    wb.write(fileOut);
    fileOut.close();
}

I've tested it with JExcel version 2.6.12 and Apache POI version 3.7. You need to download the latest library versions yourself and run the simple tests above to get more accurate numbers.

<dependency org="org.apache.poi" name="poi" rev="3.7"/>
<dependency org="net.sourceforge.jexcelapi" name="jxl" rev="2.6.12"/>

Note: there is a limit in Apache POI of 65535 rows per sheet.


By default POI stores the document in memory using XmlBeans, which is a lot of stuff to store for a large XLSX document.

It only becomes an XLSX document (a zip file) when you stream it out.

For large documents this is a memory hog and can be slow.

The SXSSF classes avoid some of this overhead by writing the data file to disc as it gets built up, merging that with the rest of the file (from memory) when the XLSX is output. This is a LOT faster for large workbooks, but does have limitations.

There is another approach, which is much less flexible, but a lot more efficient. This approach is to treat most of the files in the XLSX as almost fixed and to stream the whole lot out as you generate. This has its own limitations (you will have to k ow all yiur formatting up front), but is fast and memory efficient. My implementation of this approach is here: https://github.com/Yaytay/vertx-xlsx-writer, but it's pretty straightforward.

My assumption would be that the in-memory model used by Excel has very little in common with the XLSX format and that it builds up each file on at a time by scanning that model for the relevant information. You can do the same, just picking up the bits of the file format that you need.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜