Parse a word document into an excel file
I have a word document that has data that I would like to parse into an excel file. The source files are hundreds of pages long. I have been working with VBA, but I just started learning the language and have run into lots of difficulties with trying to input a .doc file. I have been able to use the Open and the Line Input statement to retrieve from a .txt file but only gibberish when I try the .doc file.
I have included two links of screen shots.
The first is a screenshot of a sample of my input data.
http://img717.imageshack.us/i/input.jpg/The second is a screenshot of my desired output.
http://img3.imageshack.us/i/outputg.jpg/I have developed an algorithm of what I want to accomplish. I am just having difficulties coding. Below is the pseudocode that I have developed.
Variables:
string line = blank
series_title = blank
folder_title = blank
int series_number = 0
box_number = 0
folder_number = 0
year = 0
do while the <end_of_document> has not been reached
input line
If the first word in the line is “series”
store <series_number>
store the string after “:”into the <series_title>
end if
call parse_box(rest of line)
output < series_number > <series_title> < box_number > < folder_number ><folder_title> <year>
end do while
function parse_box(current line)
If the first word in the line is “box”
store <box_number>
end if
call parse_folder(rest of line)
end function
function parse_folder(current line)
If first word is “Folder”
store <folder_number>
end if
call parse_folder_title(rest of line)
end function
function parse_folder_title_and_year(current line)
string temp_folder_title
store everything as <temp_folder_title> until end of line
if last word in <tem开发者_开发技巧p_folder_title> is a year
store <year>
end if
if < temp_folder_title> is empty/blank
//use <folder_title> from before
else
<folder_title> is < temp_folder_title> minus <year>
end if
end parse_folder_title_and_year
Thanks ahead of time for all your help and suggestions
fopen and input commands generally only work on plain text files (things you can read in Notepad). If you want to programatically read from Microsoft word documents, you'll have to add the Microsoft Word 12.0 Object Library (or most recent version on your system) to your VBAProject references, and use the Word API to open and read the document.
Dim odoc As Word.Document
Set odoc = oWrd.Documents.Open(Filename:=DocumentPath, Visible:=False)
Dim singleLine As Paragraph
Dim lineText As String
For Each singleLine In ActiveDocument.Paragraphs
lineText = singleLine.Range.Text
'Do what you've gotta do
Next singleLine
Word doesn't have a concept of "Lines". You can read text ranges, and paragraphs, and sentences. Experiment and find what works best for getting your input text in manageable blocks.
Here is code that actually works.
'Create a New Object for Microsoft Word Application
Dim objWord As New Word.Application
'Create a New Word Document Object
Dim objDoc As New Word.Document
'Open a Word Document and Set it to the newly created object above
Set objDoc = objWord.Documents.Open(Filename:=DocFilename, Visible:=False)
Dim strSingleLine As Paragraph
Dim strLineText As String
For Each strSingleLine In objDoc.Paragraphs
strLineText = strSingleLine.Range.Text
'Do what you've gotta do
Next strSingleLine
精彩评论