开发者

Need help identifying a memory leak involving matplotlib and flask

I have written a small webapp using the flask framework that involves plotting using matplotlib. The problem is that every time I create the plot, the process consumes more memory.

I have deployed the app using mod_wsgi with a .wsgi file looking simply like this:

from yourapplication import app as application

The problems start when I acce开发者_JAVA百科ss the url which creates the plot. The function creates a plotter object which, when initilized, takes the relevant data from a sqlite3 database (the data consist of about 30 integers and equally many datetime objects), creates a plot using matplotlib and returning a StringIO object which then is displayed on screen.

This is the end of the function. The whole class can be seen here

    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    return png_output.getvalue()

When I visit the site, a process is created with about 25MB of reserved memory. The first time I create a plot it grows to 30MB and then with about 1MB for each time I use the plotter class. The default settings were using 5 process which consumed way too much memory (was up to 150MB within minutes and I'm only allowed 80MB).

I'm very new to all things involved here (web frameworks, apache, databases) so I don't have any feeling of were things might be going wrong, so any ideas are highly appreciated. Thanks!


Doing this after each call to the plot_month function solved the leak.

import gc
gc.collect()


Posting this in case it will help someone in the future.

I had the same issue and I thought the answer provided by axel22 didn't solve the issue for me.

After a bit of tinkering I realized that there were two problems:

  1. I didn't clear the Matplotlib figure, leaving it in memory forever
  2. I was calling the garbage collector in the wrong part of my code

First problem

I was doing something like this (INCORRECT):

fig = util.create_figure(....)
output = io.BytesIO()
canvas = FigureCanvas(fig)
canvas.print_png(output)

but I needed to do this (CORRECT):

fig = util.create_figure(....)
output = io.BytesIO()
canvas = FigureCanvas(fig)
canvas.print_png(output)
# Clears the figure from memory
fig.clf()

Second problem

I was calling the garbage collector in the wrong part of my code. You need to call it outside the scope where FigureCanvas is called.

This DID NOT work (INCORRECT):

import gc

def do_something():
    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    gc.collect()
    return png_output.getvalue()

do_something()

But this worked (CORRECT):

import gc

def do_something():
    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    return png_output.getvalue()

do_something()
gc.collect()


I ran into the same memory leak issue as you do when my website needed to generate a series of graphs in a loop using Flask. The documentation of matplotlib, under the section "How to use Matplotlib in a web application server", actually mentioned to avoid using matplotlib.pyplot and use matplotlib.figure.Figure instead to avoid memory leak. Please note that you need Matplotlib 3.1 or above.

Depends on how you constructed the graph (CLI vs OO Interface). The swapping of Pyplot class to Figure Class is quite straight forward. From:

import matplotlib.pyplot as plt
fig = plt.figure()

To:

from matplotlib.figure import Figure
fig = Figure()

And then just replace those codes that don't work from the CLI API to Object oriented API.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜