Returning things to PHP from a Word Macro
The objective is to get an accurate word count for a Microsoft Word file. We have a Windows server that runs Apache and PHP. There is a web service running on that machine that basically gets all the content of the document and runs the content through preg_match_all("/\S+/", $string, $m开发者_如何学编程atches); return count($matches[0]);
. Works pretty well but it's not at all accurate. So we wrote the following macro:
Sub GetWordCountBreakdown()
Dim x As Integer
Dim TotalWords As Long
Dim FieldWords As Long
TotalWords = ActiveDocument.ComputeStatistics(wdStatisticWords)
For x = 1 To ActiveDocument.Fields.Count
If ActiveDocument.Fields.Item(x).Result.ComputeStatistics(wdStatisticWords) > 25 Then
FieldWords = FieldWords + ActiveDocument.Fields.Item(x).Result.ComputeStatistics(wdStatisticWords)
End If
Next x
MsgBox (TotalWords & " - " & FieldWords & " = " & TotalWords - FieldWords)
End Sub`
When I run this macro in Word, it gives me a neat little alert box counting up all the words and references in the document. I'm not sure how to return those values to PHP so my webservice can convey them back to me.
Update: I was able to just rewrite this macro in PHP and get the correct wordcount. Basically:
$word = new COM("Word.Application")
$word->Documents->Open(file);
$wdStatisticWords = 0;
$wordcount = $word->ActiveDocument->ComputeStatistics($wdStatisticWords);
etc.
If you can read the OLE streams for the doc file, an accurate wordcount for the document should be stored in either the SummaryInformation or the DocumentSummaryInformation stream. I don't have a script that reads the properties from .doc files, but I do have code for reading the metaproperties of Excel xls files that could be adapted fairly easily.
EDIT
I've just checked, and it's property id 0x0F in the SummaryInformation stream.
Why not simply count the number of spaces in the doc string? Or am I missing something?
精彩评论