开发者

MS Word recognizing Heading/Font etc?

I want to read a MS word document and Identify Header/Bold开发者_如何学Python font words/Underscored words, etc? is there a way to solve this problem programmatically? I want the suggestion in Java or PHP or Ruby if possible, else if there is some meta-data available also let me know.


You have java API that can do that. I suggest you to look at the Apache POI library.


This is related to this What's a good Java API for creating Word documents?

There is a work in progress API for this one using Apache POI.

HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It also provides limited read only support for the older Word 6 and Word 95 file formats.and Word 95 file formats.

The partner to HWPF for the new Word 2007 .docx format is XWPF. Whilst HWPF and XWPF provide similar features, there is not a common interface across the two of them at this time.

http://poi.apache.org/hwpf/quick-guide.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜