开发者

python pdf to text convert [closed]

开发者_开发百科 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I want to convert pdf into text. I tried this code in python command prompt but it is not showing any output. Maybe I'm wrong. Can you please tell me where im wrong. Thanks in advance.

import pyPdf

def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
    # Iterate pages
    for i in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    return content

print getPDFContent("test.pdf").encode("ascii", "ignore")


If your PDF contains only images (e.g. from a scanned page) then you won't be able to extract any text.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜