unable to convert pdf to text using python script
i want to convert all my .pdf files from a specific directory to .txt f开发者_运维百科ormat using the command pdftotext... but i wanna do this using a python script... my script contains:
import glob
import os
fullPath = os.path.abspath("/home/eth1/Downloads")
for fileName in glob.glob(os.path.join(fullPath,'*.pdf')):
fullFileName = os.path.join(fullPath, fileName)
os.popen('pdftotext fullFileName')
but I am getting the following error:
Error: Couldn't open file 'fullFileName': No such file or directory.
You are passing fullFileName
literally to os.popen
. You should do something like this instead (assuming that fullFileName
does not have to be escaped):
os.popen('pdftotext %s' % fullFileName)
Also note that os.popen
is considered deprecated, it's better to use the subprocess
module instead:
import subprocess
retcode = subprocess.call(["/usr/bin/pdftotext", fullFileName])
It is also much safer as it handles spaces and special characters in fullFileName
properly.
Change the last line to
os.open('pdftotext {0}'.format(fullFileName))
This way the value of fullFileName
will be passed, instead of the name.
精彩评论