View the innards of a .ppt file?
I need to figure out what is going on inside a client's .ppt
files. What is a good way to get started?
My eventual hope is to convert it to HTML. But if I开发者_运维百科 just export the .ppt
to HTML, I get a lot of images (as opposed to text), which is not a Good Thing.
EDIT: software that automatically converts .ppt
to HTML would be terrific, provided that it preserves as much information as possible in text format. If that doesn't exist, the next best thing would be to understand the innards of the .ppt
and write my own code to do a partial conversion.
EDIT: I used OfficeConvert as recommended by Michiel Leenaars. It got me text all right. My 50-page, 8MB test file turned into 40MB of text. The fact that I got text is good. The fact that the amount went way up is moving in the wrong direction. And there is an awful lot of repetition in there. The word "style" appeared 410815 times; the word "draw" appeared 351229 times.
I think a safe way would be to use OfficeConvert to automatically convert to ODF programmatically with Microsoft Office. Run it with /?
to get help. There are some dependencies (see below).
Then use a good ODF library like lpod to look inside it.
You can view some interesting code examples here.
Dependencies:
- Microsoft .NET Framework Version 2.0 Redistributable Package (x86)
- Primary Interop Assemblies for Office 2007 or Office 2010 (whichever you are using).
I like the Aspose products. (I'm not associated with them other than as a customer.) I've used the PPT one specifically to write code that pokes around in the insides of a PPT. Overkill if you just want to convert it to HTML, but invaluable for the sorts of things I use it for.
If you know Java, Apache has the POI project which lets you take a look at the inners of a PPT project. Could get all the info you want about the project (images, text) and then convert it to html however you like.
Its free too.
精彩评论