开发者

Large PyPlot - avoid memory allocation

I'm doing a rather large PyPlot (Python matplotlib) (600000 values, each 32bit). Practically I guess I could simply do something like this:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])

Two arrays, both allocated in memory. However I'll have to plot files, which contain several Gigabyte of t开发者_Python百科hose information sooner or later.

How do I avoid passing two arrays into the plt.plot()?

I still need a complete plot however. So just an Iterator and passing the values line by line can't be done I suppose.


If you're talking about gigabytes of data, you might consider loading and plotting the data points in batches, then layering the image data of each rendered plot over the previous one. Here is a quick example, with comments inline:

import Image
import matplotlib.pyplot as plt
import numpy

N = 20
size = 4
x_data = y_data = range(N)

fig = plt.figure()

prev = None
for n in range(0, N, size):
    # clear figure
    plt.clf()

    # set axes background transparent for plots n > 0
    if n:
        fig.patch.set_alpha(0.0)
        axes = plt.axes()
        axes.patch.set_alpha(0.0)

    plt.axis([0, N, 0, N])

    # here you'd read the next x/y values from disk into memory and plot
    # them.  simulated by grabbing batches from the arrays.
    x = x_data[n:n+size]
    y = y_data[n:n+size]
    ax = plt.plot(x, y, 'ro')
    del x, y

    # render the points
    plt.draw()

    # now composite the current image over the previous image
    w, h = fig.canvas.get_width_height()
    buf = numpy.fromstring(fig.canvas.tostring_argb(), dtype=numpy.uint8)
    buf.shape = (w, h, 4)
    # roll alpha channel to create RGBA
    buf = numpy.roll(buf, 3, axis=2)
    w, h, _ = buf.shape
    img = Image.fromstring("RGBA", (w, h), buf.tostring())
    if prev:
        # overlay current plot on previous one
        prev.paste(img)
        del prev
    prev = img

# save the final image
prev.save('plot.png')

Output:

Large PyPlot - avoid memory allocation


Do you actually need to plot individual points? It seems like a density plot would work just as well, with so many datapoints available. You might look into pylab's hexbin or numpy.histogram2d. For such large files, you'd probably have to use numpy.memmap, though, or work in batches, as @samplebias says.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜