python pdf to text convert [closed]
I want to convert pdf into text. I tried this code in python command prompt but it is not showing any output. Maybe I'm wrong. Can you please tell me where im wrong. Thanks in advance.
import pyPdf
def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
    # Iterate pages
    for i in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    return content
print getPDFContent("test.pdf").encode("ascii", "ignore")
If your PDF contains only images (e.g. from a scanned page) then you won't be able to extract any text.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论