开发者

Word 2003/2007 - Multiline Regular Expression

I have a Word document containing Questions and Answers in the following format:

1. What is the name of our 开发者_运维知识库planet?
a) Earth
b) Mars
c) Venus
d) Jupiter

ANSWER:
a
TYPE: MC  DIFFICULTY: Easy
KEYWORDS: planet solar system

What I need to do is "split" the document in two parts, the first containing only the questions, the second containing only the answers. The result should be as follows.

Document 1 - Questions

1. What is the name of our planet?
a) Earth
b) Mars
c) Venus
d) Jupiter

Document 2 - Answers

1. ANSWER:
a
TYPE: MC  DIFFICULTY: Easy
KEYWORDS: planet solar system

The documents have quite a relatively regular structure, i.e. - List number - Question text - A line containing "ANSWERL:" - Answer text - Two CRLF

I tried using Regular Expressions to match the text and extract it from the document, but I have difficulties using Word's proprietary RegEx syntax and I couldn't find out how to make a RegEx that spans multiple lines and multiple document blocks. I also tried PowerGREP; the RegEx works, but it can only read the plain text from the document and it loses all the lists (e.g. the numbers of both questions and answers) and all the objects (some questions and answers have graphs and tables that I must keep).

To summarize, I have to follow this logic.

  1. Select everything (text and objects) from the question

    number until the word "ANSWER" (excluded). Do this for each question (i.e. process one question/block at a time).

  2. Select everything (text and objects) from the word "ANSWER" until the next question (excluded).

The document is in .DOC format, but I can also save it in .DOCX. Note: I tried parsing the XML of the .DOCX, but it contains thousands of superfluous tags, making everything impossibly complicated.


IMHO, you shouldn't use REGEXP (even if it would be nicer, i'm afraid - and your first tries prove it - that it would be quite hard).

What you can try is to loop through the document with the vba word find function with the expression ANSWER:.
Then you can loop over each block:

  • Cut-paste selection from the beginning till the beginning of ANSWER: in the Question document.
  • Cut-paste the next part (extend selection to the next 3 or 4 lines) in the Answer document.

See here >> chapter Find, Replace Hard Returns With Manual Line Breaks for some tips. And this link for more information.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜