I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

2023-01-10 03:14 问答作者：

In my case, I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings. As I build this dictionary up, my RAM goes up super high. Is there a way to write the dictionary as it is being built to the harddrive rather than the RAM so that I can save some memory? I've heard of something called "pickle" but I don't know if this is a f开发者_JS百科easible method for what I am doing.

Thanks for your help!

Maybe you should be using a database, but check out the shelve module

If shelve isn't powerful enough for you, there is always the industrial strength ZODB

shelve, as @gnibbler recommends, is what I would no doubt be using, but watch out for two traps: a simple one (all keys must be strings) and a subtle one (as the values don't normally exist in memory, calling mutators on them may not work as you expect).

For the simple problem, it's normally easy to find a workaround (and you do get a clear exception if you forget and try e.g. using an int or whatever as the key, so it's not hard t remember that you do need a workaround either).

For the subtle problem, consider for example:

x = d['foo']
x.amutatingmethod()
...much later...
y = d['foo']
# is y "mutated" or not now?

the answer to the question in the last comment depends on whether d is a real dict (in which case y will be mutated, and in fact exactly the same object as x) or a shelf (in which case y will be a distinct object from x, and in exactly the state you last saved to d['foo']!).

To get your mutations to persist, you need to "save them to disk" by doing

d['foo'] = x

after calling any mutators you want on x (so in particular you cannot just do

d['foo'].mutator()

and expect the mutation to "stick", as you would if d were a dict).

shelve does have an option to cache all fetched items in memory, but of course that can fill up the memory again, and result in long delays when you finally close the shelf object (since all the cached items must be saved back to disk then, just in case they had been mutated). That option was something I originally pushed for (as a Python core committer), but I've since changed my mind and I now apologize for getting it in (ah well, at least it's not the default!-), since the cases it should be used in are rare, and it can often trap the unwary user... sorry.

BTW, in case you don't know what a mutator, or "mutating method", is, it's any method that alters the state of the object you call it on -- e.g. .append if the object is a list, .pop if the object is any kind of container, and so on. No need to worry if the object is immutable, of course (numbers, strings, tuples, frozensets, ...), since it doesn't have mutating methods in that case;-).

Pickling an entire hash over and over again is bound to run into the same memory pressures that you're facing now -- maybe even worse, with all the data marshaling back and forth.

Instead, using an on-disk database that acts like a hash is probably the best bet; see this page for a quick introduction to using dbm-style databases in your program: http://docs.python.org/library/dbm

They act enough like hashes that it should be a simple transition for you.

"""I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings""" ... I presume that you mean: """I have a class with about 1000 attributes all of type str or list of str. I have a dictionary mapping about 6000 keys of unspecified type to corresponding instances of that class.""" If that's not a reasonable translation, please correct it.

For a start, 1000 attributes in a class is mindboggling. You must be treating the vast majority generically using value = getattr(obj, attr_name) and setattr(obj, attr_name, value). Consider using a dict instead of an instance: value = obj[attr_name] and obj[attr_name] = value.

Secondly, what percentage of those 6 million attributes are ""? If sufficiently high, you might like to consider implementing a sparse dict which doesn't physically have entries for those attributes, using the __missing__ hook -- docs here.

继续阅读：python

I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？