开发者

Looking for a library for parsing and extracting objects from ppt, pptx, doc, docx files [closed]

Closed. This question does not meet Stack Overflow guid开发者_JAVA技巧elines. It is not currently accepting answers.

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

Closed 9 years ago.

Improve this question

I am looking for a library that can open a ppt, pptx, doc, docx files parse it and extract all objects from it.

for example, in ppt it can extract all object properties like images, text, tables autoshapes etc.. then provide me with object location/size and formatting like font size/color/bold etc.. and for images the ability to save each image to a jpg file. The library should also be able to take a snapshot of the whole slide.

I have tried aspose for doing this, but it wasn't accurate in getting this information. doesn't extract all properties plus it's export as image isn't accurate. Is there any ideas in using open office library for doing that?

I am open to use Java or a C++ library.


At work we used the openoffice Java api to extract the images from ppt/pptx files. I used the docs from here. I am pretty sure you can use the info in that guide to do what you need.

good luck.


One option is the apache poi library - there's examples around and there seems to be more material around than on the openoffice API.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜