开发者

python: how to create persistent in-memory structure for debugging

[Python开发者_C百科 3.1]

My program takes a long time to run just because of the pickle.load method on a huge data structure. This makes debugging very annoying and time-consuming: every time I make a small change, I need to wait for a few minutes to see if the regression tests passed.

I would like replace pickle with an in-memory data structure.

I thought of starting a python program in one process, and connecting to it from another; but I am afraid the inter-process communication overhead will be huge.

Perhaps I could run a python function from the interpreter to load the structure in memory. Then as I modify the rest of the program, I can run it many times (without exiting the interpreter in between). This seems like it would work, but I'm not sure if I will suffer any overhead or other problems.


You can use mmap to open a view on the same file in multiple processes, with access at almost the speed of memory once the file is loaded.


First you can pickle different parts of the hole object using this method:

# gen_objects.py

import random
import pickle

class BigBadObject(object):
   def __init__(self):
      self.a_dictionary={}
      for x in xrange(random.randint(1, 1000)):
         self.a_dictionary[random.randint(1,98675676)]=random.random()
      self.a_list=[]
      for x in xrange(random.randint(1000, 10000)):
         self.a_list.append(random.random())
      self.a_string=''.join([chr(random.randint(65, 90)) 
                        for x in xrange(random.randint(100, 10000))])

if __name__=="__main__":
   output=open('lotsa_objects.pickled', 'wb')
   for i in xrange(10000):
      pickle.dump(BigBadObject(), output, pickle.HIGHEST_PROTOCOL)
   output.close()

Once you generated the BigFile in various separate parts you can read it with a python program with several running at the same time reading each one different parts.

# reader.py

from threading import Thread
from Queue import Queue, Empty
import cPickle as pickle
import time
import operator

from gen_objects import BigBadObject

class Reader(Thread):
   def __init__(self, filename, q):
      Thread.__init__(self, target=None)
      self._file=open(filename, 'rb')
      self._queue=q
   def run(self):
      while True:
         try:
            one_object=pickle.load(self._file)
         except EOFError:
            break
         self._queue.put(one_object)

class uncached(object):
   def __init__(self, filename, queue_size=100):
      self._my_queue=Queue(maxsize=queue_size)
      self._my_reader=Reader(filename, self._my_queue)
      self._my_reader.start()
   def __iter__(self):
      while True:
         if not self._my_reader.is_alive():
            break
         # Loop until we get something or the thread is done processing.
         try:
            print "Getting from the queue. Queue size=", self._my_queue.qsize()
            o=self._my_queue.get(True, timeout=0.1) # Block for 0.1 seconds 
            yield o
         except Empty:
            pass
      return

# Compute an average of all the numbers in a_lists, just for show.
list_avg=0.0
list_count=0

for x in uncached('lotsa_objects.pickled'):
   list_avg+=reduce(operator.add, x.a_list)
   list_count+=len(x.a_list)

print "Average: ", list_avg/list_count

This way of reading the pickle file will take 1% of the time it takes in the other way. This is because you are running 100 parallel threads at the same time.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜