Does linux disk buffer cache make python cPickle more efficient than shelve?

2023-01-22 12:23 问答作者：

Is IO more efficient, due to the linux disk buffer cache, when storing frequently accessed python objects as separate cPickle files instead of storing all objects in one large shelf?

Does the disk buffer cache operate differently in these two scenarios with respect to efficiency?

There may be thousands of large files (gen开发者_StackOverflow社区erally around 100Mb, but sometimes 1Gb), but much RAM (eg 64 Gb).

I don't know of any theoretical way to decide which method is faster, and even if I did, I'm not sure I would trust it. So let's write some code and test it.

If we package our pickle/shelve managers in classes with a common interface, then it will be easy to swap them in and out of your code. So if at some future point you discover one is better than the other (or discover some even better way) all you have to do is write a class with the same interface and you'll be able to plug the new class into your code with very little modification to anything else.

test.py:

import cPickle
import shelve
import os

class PickleManager(object):
    def store(self,name,value):
        with open(name,'w') as f:
            cPickle.dump(value,f)
    def load(self,name):
        with open(name,'r') as f:
            return cPickle.load(f)

class ShelveManager(object):
    def __enter__(self):
        if os.path.exists(self.fname):
            self.shelf=shelve.open(self.fname)
        else:
            self.shelf=shelve.open(self.fname,'n')
        return self
    def __exit__(self,ext_type,exc_value,traceback):
        self.shelf.close()
    def __init__(self,fname):
        self.fname=fname
    def store(self,name,value):
        self.shelf[name]=value        
    def load(self,name):
        return self.shelf[name]

def write(manager):                
    for i in range(100):
        fname='/tmp/{i}.dat'.format(i=i)
        data='The sky is so blue'*100
        manager.store(fname,data)
def read(manager):        
    for i in range(100):
        fname='/tmp/{i}.dat'.format(i=i)        
        manager.load(fname)

Normally, you'd use PickleManager like this:

manager=PickleManager()
manager.load(...)
manager.store(...)

while you'd use the ShelveManager like this:

with ShelveManager('/tmp/shelve.dat') as manager:        
    manager.load(...)
    manager.store(...)

But to test performance, you could do something like this:

python -mtimeit -s'import test' 'with test.ShelveManager("/tmp/shelve.dat") as s: test.read(s)'
python -mtimeit -s'import test' 'test.read(test.PickleManager())'
python -mtimeit -s'import test' 'with test.ShelveManager("/tmp/shelve.dat") as s: test.write(s)'
python -mtimeit -s'import test' 'test.write(test.PickleManager())'

At least on my machine, the results came out like this:

                  read (ms)     write (ms)
PickleManager     9.26          7.92 
ShelveManager     5.32          30.9

So it looks like ShelveManager may be faster at reading, but PickleManager may be faster at writing.

Be sure to run these tests yourself. Timeit results can vary due to version of Python, OS, filesystem type, hardware, etc.

Also, note my write and read functions generate very small files. You'll want to test this on data more similar to your use case.

继续阅读：caching python shelve

Does linux disk buffer cache make python cPickle more efficient than shelve?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？