PDF page count not correct
I was just wondering why the vbs code in the link below is not counting pdf pages correctly? It seems to under count by half or more the number of pages that actually exist in each pdf.
http://docs.ongetc.com/index.php?q=content/pdf-pages-counting-using-vb-script
Here is the code if you can not access the link above:
' By Chanh Ong
'File: pdfpagecount.vbs
' Purpose: count pages in pdf file in folder
Const OPEN_FILE_FOR_READING = 1
Set gFso = WScript.CreateObject("Scripting.FileSystemObject")
Set gShell = WScript.CreateObject ("WSCript.shell")
Set gNetwork = Wscript.CreateObjec开发者_如何学Ct("WScript.Network")
directory="."
set base=gFso.getFolder(directory)
call listPDFFile(base)
Function ReadAllTextFile(filespec)
Const ForReading = 1, ForWriting = 2
Dim f
Set f = gFso.OpenTextFile(filespec, ForReading)
ReadAllTextFile = f.ReadAll
End Function
function countPage(sString)
Dim regEx, Match, Matches, counter, sPattern
sPattern = "/Type\s*/Page[^s]" ' capture PDF page count
counter = 0
Set regEx = New RegExp ' Create a regular expression.
regEx.Pattern = sPattern ' Set pattern "^rem".
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True ' Set global applicability.
set Matches = regEx.Execute(sString) ' Execute search.
For Each Match in Matches ' Iterate Matches collection.
counter = counter + 1
Next
if counter = 0 then
counter = 1
end if
countPage = counter
End Function
sub listPDFFile(grp)
Set pf = gFso.CreateTextFile("pagecount.txt", True)
for each file in grp.files
if (".pdf" = lcase(right(file,4))) then
larray = ReadAllTextFile(file)
pages = countPage(larray)
wscript.echo "The " & file.name & " PDF file has " & pages & " pages"
pf.WriteLine(file.name&","&pages)
end if
next
pf.Close
end sub
Thanks
The solution offered (and accepted) will only work for a limited number of PDF documents. Since PDF documents frequently compress large chunks of data including page metadata, crude regular expression searches for "type\s*/page[^s]" will often miss pages.
The only really reliable solution is to very laboriously decompose the PDF document. I'm afraid I don't have a working VBS solution but I have written a Delphi function which demonstrates how to do this (see http://www.angusj.com/delphitips/pdfpagecount.php).
Try this
Function getPdfPgCnt(ByVal sPath)
Dim strTStr
With CreateObject("Adodb.Stream")
.Open
.Charset = "x-ansi"
.LoadFromFile sPath
strTStr = .ReadText(-1)
End With
With (New RegExp)
.Pattern = "Type\s+/Page[^s]"
.IgnoreCase = True
.Global = True
getPdfPgCnt = .Execute(strTStr).Count
End With
If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function
'Usage : getPdfPgCnt("C:\1.pdf")
Update #1~#2:
Option Explicit
Private Function getPdfPgCnt(ByVal sPath) 'Returns page count of file on passed path
Dim strTStr
With CreateObject("Adodb.Stream")
.Open
.Charset = "x-ansi"
.LoadFromFile sPath
strTStr = .ReadText(-1)
End With
With (New RegExp)
.Pattern = "Type\s*/Page[^s]"
.IgnoreCase = True
.Global = True
getPdfPgCnt = .Execute(strTStr).Count
End With
If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function
'--------------------------------
Dim oFso, iFile
Set oFso = CreateObject("Scripting.FileSystemObject")
'enumerating pdf files in vbs's base directory
For Each iFile In oFso.getFolder(oFso.GetParentFolderName(WScript.ScriptFullName)).Files
If LCase(oFso.GetExtensionName(iFile)) = "pdf" Then WScript.Echo iFile & " has "& getPdfPgCnt(iFile)&" pages."
Next
Set oFso = Nothing
'--------------------------------
精彩评论