开发者

PDF page count not correct

I was just wondering why the vbs code in the link below is not counting pdf pages correctly? It seems to under count by half or more the number of pages that actually exist in each pdf.

http://docs.ongetc.com/index.php?q=content/pdf-pages-counting-using-vb-script

Here is the code if you can not access the link above:

' By Chanh Ong
'File: pdfpagecount.vbs
' Purpose: count pages in pdf file in folder
Const OPEN_FILE_FOR_READING = 1

Set gFso = WScript.CreateObject("Scripting.FileSystemObject")
Set gShell = WScript.CreateObject ("WSCript.shell")
Set gNetwork = Wscript.CreateObjec开发者_如何学Ct("WScript.Network")

  directory="." 
  set base=gFso.getFolder(directory) 
  call listPDFFile(base) 

Function ReadAllTextFile(filespec)
   Const ForReading = 1, ForWriting = 2
   Dim f
   Set f = gFso.OpenTextFile(filespec, ForReading)
   ReadAllTextFile =   f.ReadAll
End Function

function countPage(sString)
  Dim regEx, Match, Matches, counter, sPattern
  sPattern = "/Type\s*/Page[^s]"  ' capture PDF page count
  counter = 0

  Set regEx = New RegExp         ' Create a regular expression.
  regEx.Pattern = sPattern    ' Set pattern "^rem".
  regEx.IgnoreCase = True         ' Set case insensitivity.
  regEx.Global = True         ' Set global applicability.
  set Matches = regEx.Execute(sString)   ' Execute search.
  For Each Match in Matches      ' Iterate Matches collection.
    counter = counter + 1
  Next
  if counter = 0 then
    counter = 1
  end if
  countPage = counter
End Function

sub listPDFFile(grp) 
  Set pf = gFso.CreateTextFile("pagecount.txt", True)
for each file in grp.files 
    if (".pdf" = lcase(right(file,4))) then 
      larray = ReadAllTextFile(file)
      pages = countPage(larray)
      wscript.echo "The " & file.name & " PDF file has " & pages & " pages"
      pf.WriteLine(file.name&","&pages) 
    end if
next 
  pf.Close
end sub

Thanks


The solution offered (and accepted) will only work for a limited number of PDF documents. Since PDF documents frequently compress large chunks of data including page metadata, crude regular expression searches for "type\s*/page[^s]" will often miss pages.

The only really reliable solution is to very laboriously decompose the PDF document. I'm afraid I don't have a working VBS solution but I have written a Delphi function which demonstrates how to do this (see http://www.angusj.com/delphitips/pdfpagecount.php).


Try this

Function getPdfPgCnt(ByVal sPath)
    Dim strTStr

    With CreateObject("Adodb.Stream")
        .Open
        .Charset = "x-ansi"
        .LoadFromFile sPath
        strTStr = .ReadText(-1)
    End With

    With (New RegExp)
        .Pattern = "Type\s+/Page[^s]"
        .IgnoreCase = True
        .Global = True
        getPdfPgCnt = .Execute(strTStr).Count
    End With

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function

'Usage : getPdfPgCnt("C:\1.pdf")

Update #1~#2:

Option Explicit

Private Function getPdfPgCnt(ByVal sPath) 'Returns page count of file on passed path
    Dim strTStr

    With CreateObject("Adodb.Stream")
        .Open
        .Charset = "x-ansi"
        .LoadFromFile sPath
        strTStr = .ReadText(-1)
    End With

    With (New RegExp)
        .Pattern = "Type\s*/Page[^s]"
        .IgnoreCase = True
        .Global = True
        getPdfPgCnt = .Execute(strTStr).Count
    End With

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function

'--------------------------------
Dim oFso, iFile
Set oFso = CreateObject("Scripting.FileSystemObject")

'enumerating pdf files in vbs's base directory
For Each iFile In oFso.getFolder(oFso.GetParentFolderName(WScript.ScriptFullName)).Files
    If LCase(oFso.GetExtensionName(iFile)) = "pdf" Then WScript.Echo iFile & " has "& getPdfPgCnt(iFile)&" pages."
Next
Set oFso = Nothing
'--------------------------------
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜