Python subprocess; Can't read stdout
I have about 500,000+ txt file for about 7+ gigs of data total. I am using python to put them into a sqlite database. I am creating 2 tables, 1. is the pK and the hyperlink to the file. For the other table I am using an entity extractor that was devloped in perl by a coworker.
To accomplish this I am using subprocess.Popen(). T Prior to this method I was opening the perl at every iteration of my loop, but it was simply to expensive to be useful.
I need the perl to be dynamic, I need to be able to send data back and fourth from it and the process not terminate untilI tell it to do so. The perl was modified so it perl accepts the full string of a file as a stdin, and gives me a stdout when it gets a \n. But I am having trouble reading data...
If I use communicate, at the next iteration in my loop my subprocess is terminated, I get an I/O error. If I try and use readline() or read(), it locks up. Here are some examples of the differant behavior I am experiancing.
This deadlocks my system and I need to force close python to continue.
numberExtractor = subprocess.Popen(["C:\\Perl\\bin\\perl5.10.0.exe","D:\\MyDataExtractor\\extractSerialNumbers.pl"], stdout=subproc开发者_如何学Pythoness.PIPE, stdin= subprocess.PIPE)
for infile in glob.glob(self.dirfilename + '\\*\\*.txt'):
f = open(infile)
reportString = f.read()
f.close()
reportString = reportString.replace('\n',' ')
reportString = reportString.replace('\r',' ')
reportString = reportString +'\n'
numberExtractor.stdin.write(reportString)
x = numberExtractor.stdout.read() #I can not see the STDOUT, python freezes and does not run past here.
print x
This cancels the subprocess and I get an I/O error at the next iteration of my loop.
numberExtractor = subprocess.Popen(["C:\\Perl\\bin\\perl5.10.0.exe","D:\\MyDataExtractor\\extractSerialNumbers.pl"], stdout=subprocess.PIPE, stdin= subprocess.PIPE)
for infile in glob.glob(self.dirfilename + '\\*\\*.txt'):
f = open(infile)
reportString = f.read()
f.close()
reportString = reportString.replace('\n',' ')
reportString = reportString.replace('\r',' ')
reportString = reportString +'\n'
numberExtractor.stdin.write(reportString)
x = numberExtractor.communicate() #Works good, I can see my STDOUT from perl but the process terminates and will not run on the next iteration
print x
If I just run it like this, It runs through all the code fine. the print line is ', mode 'rb' at 0x015dbf08> for each item in my folder.
numberExtractor = subprocess.Popen(["C:\\Perl\\bin\\perl5.10.0.exe","D:\\MyDataExtractor\\extractSerialNumbers.pl"], stdout=subprocess.PIPE, stdin= subprocess.PIPE)
for infile in glob.glob(self.dirfilename + '\\*\\*.txt'):
f = open(infile)
reportString = f.read()
f.close()
reportString = reportString.replace('\n',' ')
reportString = reportString.replace('\r',' ')
reportString = reportString +'\n'
numberExtractor.stdin.write(reportString)
x = numberExtractor.stdout #I can not get the value of the object, but it runs through all my files fine.
print x
Hopefully I am making a simple mistake, but is there some way I can just send a file to my perll (stdin), get the stdout, and then repeat without having to reopen my subprocess for every file in my loop?
Consider using the shell. Life is simpler.
perl extractSerialNumbers.pl *.txt | python load_database.py
Don't mess around with having Python start perl and all that. Just read the results from perl and process those results in Python.
Since both processes run concurrently, this tends to be pretty fast and use a lot of CPU resources without much programming on your part.
In the Python program (load_database.py) you can simply use fileinput
module to read the entire file provided on stdin.
import fileinput
for line in fileinput.input():
load the row into the database
That's about all you need in the Python program if you make the shell do the dirty work of setting up the pipeline.
精彩评论