开发者

unable to convert pdf to text using python script

i want to convert all my .pdf files from a specific directory to .txt f开发者_运维百科ormat using the command pdftotext... but i wanna do this using a python script... my script contains:

import glob 
import os

fullPath = os.path.abspath("/home/eth1/Downloads")

for fileName in glob.glob(os.path.join(fullPath,'*.pdf')):
   fullFileName = os.path.join(fullPath, fileName)
   os.popen('pdftotext fullFileName')

but I am getting the following error:

Error: Couldn't open file 'fullFileName': No such file or directory.


You are passing fullFileName literally to os.popen. You should do something like this instead (assuming that fullFileName does not have to be escaped):

os.popen('pdftotext %s' % fullFileName)

Also note that os.popen is considered deprecated, it's better to use the subprocess module instead:

import subprocess
retcode = subprocess.call(["/usr/bin/pdftotext", fullFileName])

It is also much safer as it handles spaces and special characters in fullFileName properly.


Change the last line to

os.open('pdftotext {0}'.format(fullFileName))

This way the value of fullFileName will be passed, instead of the name.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜