python multiprocessing db access is very slow
I have GUI that will interact with a postgres database, using psycopg2. I have db connection in a multiprocessing process, and send SQL via a multiprocessing queue, and receive via another queue.
The problem is that the speed is very very slow. A simple select * from a small table (30 rows) can be 1/10th of a second, or can take over one second.
Does any one have any clues as to why it's so slow?
New Information: It works fine on winxp, exact same code, so the intermittent delay is only happening on my linux box (ubuntu 9.10)
More info: Having stubbed out the select it appears it's not the problem.
Here is the main part of the db class.
class DataBase(multiprocessing.Process):
def __init__(self, conn_data, in_queue, out_queue):
multiprocessing.Process.__init__(self)
self.in_queue = in_queue
self.out_queue = out_queue
self.conn_data = conn_data
self.all_ok = True
def run(self):
proc_name = self.name
self.conn = self.get_connection(self.conn_data)
print("Running ", self.name)
while True:
next_job = self.in_queue.get()
print("Next Job: ",next_job)
if next_job is None:
# Stop Process
break
SQL = next_job[0]
callback = next_job[1]
result = self.execute(SQL)
self.out_queue.put((result, ca开发者_StackOverflow中文版llback))
print("Closing connection ", self.name)
self.conn.close()
return
And in the GUI I have this:
def recieve_data(self):
"Revived data on the queue. Data is a tuple of the actual data and a calback name."
if self.recieve_queue.empty() == False:
data = self.recieve_queue.get()
callback_name = data[1]
try:
callback = getattr(self, callback_name)
callback(data[0])
except AttributeError as e:
util.error_ui(err = e)
self.check_data_timeout = None
return False # Stop checking.
return True # Have the main loop keep checking for data.
def request_data(self, SQL, callback):
self.send_queue.put((SQL, callback))
self.check_data_timeout = gobject.timeout_add(50, self.recieve_data) # Poll the database recieved_queue
Try to isolate what is taking the time - is it the multiprocessing or the database? For example try calling the database directly from the python interactive shell - the ipython shell has 'time' and 'timeit' commands for measuring things like this. Alternatively stub out DataBase.execute to return canned values, and see what difference it makes.
What about gobject.timeout_add? What is that doing? It is possible that the delay is in there rather than the database or the multiprocessing code.
Have you tried opening new database connection for each of your processes? It seems to me that you are simply adding overhead trying to reuse them in different processes.
Also, I'm not sure (your sample is to small to deduce) but it looks like you're opening new DB connection for each query... Are you closing connection with self.conn.close()
after each query? You're supposed to have one long-lasting connection.
It seems to be problems or a bug specific to Ubuntu 9.10
All works fine on Ubuntu 9.04 and win32, even on win32 on a virtual machine hosted on Ubuntu 9.10.
Thanks for all the advice.
精彩评论