开发者

In Microsoft Word Documents properties of each and every character is stored in a file structure, which file structure is used for this purpose?

In Microsoft W开发者_如何学Cord Documents properties of each and every character is stored in a file structure, which file structure is used for this purpose ?


There are several formats for Microsoft Word documents that are commonly found in the wild.

The first is the old standard .doc format, used for years since the original version of Word. It was standardized for versions 97 to 2003, and the file format specification is available here on MSDN.
If you're not so interested in the technical details, the Wikipedia article provides a decent overview:

During the late 1990s and early 2000s, the default Word document format (.DOC) became a de facto standard of document file formats for Microsoft Office users. Though usually just referred to as "Word Document Format", this term refers primarily to the range of formats used by default in Word version 97-2003.

Word document files by using the Word 97-2003 Binary File Format implement OLE (Object Linking and Embedding) structured storage to manage the structure of their file format. OLE behaves rather like a conventional hard drive file system and is made up of several key components. Each Word document is composed of so-called "big blocks" which are almost always (but do not have to be) 512-byte chunks; hence a Word document's file size will in most cases be a multiple of 512.

"Storages" are analogues of the directory on a disk drive, and point to other storages or "streams" which are similar to files on a disk. The text in a Word document is always contained in the "WordDocument" stream. The first big block in a Word document, known as the "header" block, provides important information as to the location of the major data structures in the document. "Property storages" provide metadata about the storages and streams in a doc file, such as where it begins and its name and so forth. The "File information block" contains information about where the text in a Word document starts, ends, what version of Word created the document and other attributes.

Word 2003 changed the game, introducing a new file format based on XML. This new file format became the default for this version of Word, although it continued to support the old .doc format for backwards compatibility reasons. That format is described in this Wikipedia article.

Finally, Office 2007 introduced the Office Open XML file formats, including the .docx format for Word. There's a Wikipedia article on that, too. Or if you'd prefer the technical nitty-gritty, consult this reference article on MSDN: Walkthrough: Word 2007 XML Format

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜