List fonts used by a Word Document (faster method)
I am working on a process for validating documents to make sure that they meet corporate standards. One o开发者_JAVA百科f the steps is to make sure that the Word document does not use non-approved fonts.
I have the following stub of code, which works:
Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
Dim wordDocument As Word.Document = Nothing
Dim fontList As New List(Of String)()
Try
wordDocument = wordApplication.Documents.Open(FileName:="document Path")
'I've also tried using a for loop with an integer counter, no change in speed'
For Each c As Word.Range In wordDocument.Characters
If Not fontList.Contains(c.Font.Name) Then
fontList.Add(c.Font.Name)
End If
Next
But this is incredibly slow! Incredibly slow = 2500 characters/minute (I timed it with StopWatch). Most of my files are around 6,000 words/30,000 characters (about 25 pages). But there are some documents that are in the 100's of pages...
Is there a faster way of doing this? I have to support Office 2003 format files, so the Open XML SDK isn't an option.
--UPDATE--
I tried running this as a Word macro (using the code found @ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html) and it runs much faster (under a minute). Unfortunately for my purposes I don't believe a Macro will work.
--UPDATE #2--
I took Chris's advice and converted the document to Open XML format on the fly. I then used the following code to find all RunFonts objects and read the font name:
Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()
fontList.AddRange(runFonts)
End Using
You might have to support Office 2003 but that doesn't mean you have to parse it in that format. Take the Office 2003 documents, temporarily convert them to DOCX files, open that as a ZIP file, parse the /word/fontTable.xml
file and then delete the DOCX.
Another way I found without coding is :
- export document as PDF
- open it in adobe reader
- in adobe reader go to : file menu\properties and then fonts tab, which lists the family fonts and sub-fonts those used in document.
Even maybe Developers and Programmers could use this procedure to code it and take out PDF Font list to what could be useful for more people.
You can speed things up a lot by iterating over paragraphs. Only if a paragraph contains mixed fonts would you need to check character by character. The Name, Bold, Italic, etc. properties have a special "indeterminate" value, which is an empty string for the Name or 9999999 for the style attributes.
So, for example, if Bold = 9999999 it means the paragraph contains some bold and some non-bold characters.
I include the following fragment to show the general idea:
For Each P as Paragraph in doc.Paragraphs
Dim R as Range = P.Range
If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
' This paragraph uses mixed fonts, so we need to analyse character by character
AnalyseCharacterByCharacter(R)
Else
' R.Font is used throughout this paragraph
FontHasBeenUsed(R.Font)
End If
Next
That's the wrong way round I think. We are looking for the fact of a font's inclusion not the location of that font. It's an existential rather than a positional problem.
Much, much, much quicker is to iterate the fonts. Only trick is that Word is sometimes fussy about spaces and so forth. This works well for me
Sub FindAllFonts()
Dim lWhichFont As Long, sTempName As String, sBuffer As String
For lWhichFont = 1 To FontNames.Count
sTempName = FontNames(lWhichFont)
If FindThisFont(sTempName) Then
sBuffer = sBuffer & "Found " & sTempName & vbCrLf
Else
If FindThisFont(Replace(sTempName, " ", "")) Then
sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
End If
End If
Next
Documents.Add
Selection.TypeText Text:=sBuffer
End Sub
Function FindThisFont(sName As String) As Boolean
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
With Selection.Find
.Font.Name = sName
.Forward = True
.Format = True
.Execute
If .Found() Then
FindThisFont = True
Else
FindThisFont = False
End If
End With
End Function
It works very fast (the only slow component is the font iteration)
(It won't find fonts not on your system, obviously, but if you are trying to prepare for transport something you wrote, and some assistant program has put Helvetica or MS Minchin in, you can find it)
OK, people told me that this was not what everyone wants, people want to find fonts that aren't on their machines. But the other way is still too slow and involves looking for a lot of stuff not there. So here is an alternative that saves out as rtf, and processes the rtf header.
Sub FIndAllFonts2()
Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
Dim objPic As InlineShape, objShp As Shape
' rememer old name for reloading
sOldName = ActiveDocument.Path
'delete image to make file out small
For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
sTempFile = ActiveDocument.Path
ActiveDocument.Close
lPos2 = 1
' we only want the header, but we don't know how long the file is
'I am using a Mac, so filesystemobject not available
' if you end up having a huge header, make 2500 bigger
lStopAt = 2500
Open sTempFile For Input As #1
Do While Not EOF(1) And lPos2 < lStopAt
sBit = Input(1, #1)
sBuffer = sBuffer & sBit
lPos2 = lPos2 + 1
Loop
Close #1
'delete temp file
Kill sTempFile
' loop through header, fonts identified in the table as {\f1\
' if you have more than 100 fonts, make this bigger
' not all numbers are used
lStopAt = 100
For lCounter = 1 To lStopAt
lPos = InStr(sBuffer, "{\f" & lCounter & "\")
If lPos > 0 Then
sBuffer = Mid(sBuffer, lPos)
lPos = InStr(sBuffer, ";")
sBuffer2 = Left(sBuffer, lPos - 1)
'this is where you would look for the alternate name if you want it
lPos = InStr(sBuffer2, "{\*\falt")
If lPos > 0 Then
sBuffer2 = Left(sBuffer2, lPos - 1)
sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
Else
sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
End If
sOut = sOut & sBuffer2 & vbCrLf
End If
Next
'reopen old file
Documents.Open sOldName
Set newdoc = Documents.Add
sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
Selection.TypeText Text:=sOut
End Sub
This goes through my 350 page thesis draft in under 20 seconds on a MacBook Pro. So it is quick enough to be useful.
If you want to get all fonts used within your doc. you could simply get all of them through one line using OPEN XML:
using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
{
var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
}
Each Font element has its "Name" property which is referenced in element in the properties of a text run.
Hint: you have to consider that each word doc. does not have more than 2 Font table parts, one in main part and the other in glossary part so don't forget to consider also glossary one if needed.
You could download OPEN XML SDK from here
This might be quicker than converting documents to .docx before processing them with OpenXml (for the record, you could also work with the property document.Content.WordOpenXML instead of document.Content.XML):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using Word = NetOffice.WordApi;
namespace _5261108
{
class Program
{
static void Main(string[] args)
{
using (var app = new Word.Application())
{
var doc = app.Documents.Open(Path.GetFullPath("test.docx"));
foreach (var font in GetFontNames(doc))
{
Console.WriteLine(font);
}
app.Quit(false);
}
Console.ReadLine();
}
private static IEnumerable<string> GetFontNames(Word.Document document)
{
var xml = document.Content.XML;
var doc = XDocument.Parse(xml);
var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
return fontNames.Distinct();
}
}
}
Converted for your convenience:
Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Xml.Linq
Imports Word = NetOffice.WordApi
Namespace _5261108
Class Program
Private Shared Sub Main(args As String())
Using app = New Word.Application()
Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))
For Each font As var In GetFontNames(doc)
Console.WriteLine(font)
Next
app.Quit(False)
End Using
Console.ReadLine()
End Sub
Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
Dim xml = document.Content.XML
Dim doc = XDocument.Parse(xml)
Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
Return fontNames.Distinct()
End Function
End Class
End Namespace
'=======================================================
'Service provided by Telerik (www.telerik.com)
'Conversion powered by NRefactory.
'Twitter: @telerik
'Facebook: facebook.com/telerik
'=======================================================
Try this:
Sub Word_Get_Document_Fonts()
Dim report As String
Dim J As Integer
Dim font_name As String
report = "Fonts in use in this document:" & vbCr & vbCr
For J = 1 To FontNames.Count
font_name = FontNames(J)
Set myrange = ActiveDocument.Range
myrange.Find.ClearFormatting
myrange.Find.Font.Name = font_name
With myrange.Find
.text = "^?"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindStop
.Format = True
End With
myrange.Find.Execute
If myrange.Find.Found Then
report = report & font_name & vbCr
End If
Next J
MsgBox (report)
End Sub
精彩评论