开发者

Cleaning memory after loading data from a JSON

开发者_高级运维I am loading a JSON file to parse it and convert it (only a part of the JSON) to a CSV. So at the end of the method I would free the space of the loaded JSON.

Here is the method:

def JSONtoCSV(input,output):
   outputWriter = csv.writer(open(output,'wb'), delimiter=',')
   jsonfile = open(input).read()
   data = loads(jsonfile)

   for k,v in data["specialKey"].iteritems():
      outputWriter.writerow([v[1],v[5]])

How do you free the space of the "data" variable?


del data

should do it if you only have one reference. Keep in mind this will happen automatically when the current scope ends (the function returns).

Also, you don't need to keep the jsonfile string around, you can just

data = json.load(open(input))

to read the JSON data directly from the file.

If you want data to go away as soon as you're done with it, you can combine all of that:

for k,v in json.load(open(input))["specialKey"].iteritems():

since there is no reference to the data once the loop has ended, Python will free the memory immediately.


In Python, variables are automatically freed when they go out of scope so you shouldn't have to worry about it. However if you really want to, you can use

del data

One thing to note is that the garbage collector probably won't kick in immediately, even if you do use del. That's the downside of garbage collection. You just don't have 100% control of memory management. That is something you will need to accept if you want to use Python. You just have to trust the garbage collector to know what it's doing.


The data variable does not take up any meaningful space—it's just a name. The data object takes up some space, and Python does not allow you to free objects manually. Objects will be garbage collected some time after there are no references to them.

To make sure that you don't keep things alive longer than you want, make sure you don't have a way to access them (don't have a name still bound to them, etc).

An improved implementation might be

def JSONtoCSV(input_filename, output_filename):
    with open(input_filename) as f:
        special_data = json.load(f)[u'specialKey']

    with open(output_filename,'wb') as f:
        outputWriter = csv.writer(f, delimiter=',')
        for k, v in special_data.iteritems():
            outputWriter.writerow([v[1], v[5]])

This doesn't ever store the string you called jsonfile or the dict you called data, so they're freed to be collected as soon as Python wants. The former improvement was made by using json.load instead of json.loads, which takes the file object itself. The latter improvement is made by looking up 'specialKey' immediately rather than binding a name to all of data.

Consider that this delicate dance probably isn't necessary at all, since as soon as you return these references will cease to be around and you've at best sped things up momentarily.


Python is a garbage-collected language, so you don't have to worry about freeing memory once you've used it; once the jsonfile variable goes out of scope, it will automatically be freed by the interpreter.

If you really want to delete the variable, you can use del jsonfile, which will cause an error if you try to refer to it after deleting it. However, unless you're loading enough data to cause a significant drop in performance, I would leave this to the garbage collector.


Please refer to Python json memory bloat. Garbage collection is not kicking-in as thresholds are not met. So even a del call will not free memory. However a forced garbage collection using gc.collect() will free up the object.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜