how to search for a word in a docx file in c++?
i am writing a search program in c++ which will search for a set of words in a set of files.. these files are either text files or docx files.The problem is how can i search a docx file开发者_运维知识库 in c++, i cannot open it even,if i need to convert it to text file, what is the procedure and how will i search it?
.docx is zip with a bunch of XML files in it. It's documented at http://openxmldeveloper.org/articles/GuidedTourOfSpecPart1.aspx
The OOXML file formats are officially documented in ECMA-376. There's an equivalent ISO standard (29500, if memory serves), but I believe you have to pay to get it, and the two are identical1. As a warning, however, these are huge documents, and the file formats themselves are definitely non-trivial to deal with. Just getting at the raw text is a relatively easy task, but still not exactly trivial.
1 The ECMA standard was accepted by the ISO under its "fast track" program, where they accept an existing standard intact, even in some cases where it doesn't completely follow the normal ISO guidelines.
If writing your own OOXML parser is not an option, you could convert your docx files with docx2txt .
精彩评论