How to efficiently output dictionary as csv file using Python's csv module? Out of memory error

2023-01-07 00:56 问答作者：

I am trying to serialize a list of dictionaries to a csv text file using Python's CSV module. My list has about 13,000 elements, each is a dictionary with ~100 keys consisting of simple text and numbers. My function "dictlist2file" simply calls DictWriter to serialize this, but I am getting out of memory errors.

My function is:

def dictlist2file(dictrows, filename, fieldnames, delimiter='\t',
                  lineterminator='\n', extrasaction='ignore'):
    out_f = open(filename, 'w')

    # Write out header
    if fieldnames != None:
        header = delimiter.join(fieldnames) + lineterminator
    else:
        header = dictrows[0].keys()
        header.sort()
    out_f.write(header)

    print "dictlist2file: serializing %d entries to %s" \
          %(len(dictrows), filename)
    t1 = time.time()
    # Write out dictionary
    data = csv.DictWriter(out_f, fieldnames,
              delimiter=delimiter,
              lineterminator=lineterminator,
                          extrasaction=extrasaction) 
    data.writerows(dictrows)
    out_f.close()
    t2 = time.time()
    print "dictlist2file: took %.2f seconds" %(t2 - t1)

When I try this on my dictionary, I get the following output:

dictlist2file: serializing 13537 entries to myoutput_file.txt
Python(6310) malloc: *** mmap(size=45862912) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to de开发者_开发百科bug
Traceback (most recent call last):
...
  File "/Library/Frameworks/Python.framework/Versions/6.2/lib/python2.6/csv.py", line 149, in writerows
    rows.append(self._dict_to_list(rowdict))
  File "/Library/Frameworks/Python.framework/Versions/6.2/lib/python2.6/csv.py", line 141, in _dict_to_list
    return [rowdict.get(key, self.restval) for key in self.fieldnames]
MemoryError

Any idea what could be causing this? The list has only 13,000 elements and the dictionaries themselves are very simple and small (100 keys) so I don't see why this would lead to memory errors or be so inefficient. It takes minutes for it to get to the memory error.

thanks for your help.

DictWriter.writerows(...) takes all the dicts you pass in to it and creates (in memory) an entire new list of lists, one for each row. So if you have a lot of data, I can see how a MemoryError would pop up. Two ways you might proceed:

Iterate over the list yourself and call DictWriter.writerow once for each one. Although this will mean a lot of writes.
Batch up rows in to smaller lists and call DictWriter.writerows for them. Less IO, but you avoid the huge chunk of memory getting allocated.

You could be tripping over an internal Python issue. I'd report it at bugs.python.org.

I don't have an answer to what is happening with csv, but I found that the following substitute serializes the dictionary to a file in less than a few seconds:

for row in dictrows:
    out_f.write("%s%s" %(delimiter.join([row[name] for name in fieldnames]),
                         lineterminator))

where dictrows is a generator of dictionaries produced by DictReader from csv, fieldnames is a list of fields.

Any idea on why csv doesn't perform similarly would be greatly appreciated. thanks.

You say that if you loop over data.writerow(single_dict) that it still gets the problem. Put in code to show the row count every 100 rows. How many dicts has it processed before it gets the Memory error? Run more or fewer processes to soak up more or less memory ... does the place where it fails vary?

What is max(len(d) for d in dictrows) ? How long are the strings in the dicts?

How much free memory do you have anyway?

Update: See if Dictwriter is the problem; eliminate it and use basic csv functionality:

writer = csv.writer(.....)
for d in dictrows:
   row = [d[fieldname] for fieldname in fieldnames]
   writer.writerow(row)

继续阅读：csv python

How to efficiently output dictionary as csv file using Python's csv module? Out of memory error

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？