Python multi-threaded application with memory leak from the thread-specific logger instances
I have a server subclass spawning threaded response handlers, the handlers in turn start application threads. Everything is going smoothly except when I use ObjGraph I see the correct number of application threads running ( I am load testing and have it throttled to keep 35 applications instances running).
Invoking objgraph.typestats() provides a break down of how many instances of each object are currently live in the interpreter (according to the GC). Looking at that output for memory leaks I find 700 logger instances - which would be the total number of response handlers spawned by the server.
I have called logger.removehandler(memoryhandler) and logger.removehandler(filehandler) when the application thread exits the run() method to ensure that there are no lingering references to the logger instances, also the logge开发者_JAVA技巧r instances is completely isolated within the application thread (there are no external references to it). As a final stab at eliminating these logger instances the last statement in run() is del self.logger
To get the logger in init() I provide it a suitably large random number to name it so it will be distinct for file access - I use the same large number as part of the log file name to avoid application log collisions.
The long and the short is I have 700 logger instances tracked by the GC but only 35 active threads - how do I go about killing off these loggers? A more cumbersome engineer solution is to create a pool of loggers and just acquire one for the life of the application thread but that is creating more code to maintain when the GC should simply handle this automatically.
Don't create potentially unbounded numbers of loggers, that's not good practice - there are other ways of getting context-sensitive information into your logs, as documented here.
You also don't need to have a logger as an instance attribute: loggers are singletons so you can just get a particular one by name from anywhere. The recommended practice is to name loggers at module level using
logger = logging.getLogger(__name__)
which suffices for most scenarios.
From your question I can't tell whether you appreciate that handlers and loggers aren't the same thing - for example you talk about removeHandler calls (which might serve to free the handler instances because their reference counts go to zero, but you won't free any logger instances by doing so).
Generally, loggers are named after parts of your application which generate events of interest.
If you want each thread to e.g. write to a different file, you can create a new filename each time, and then close the handler when you're done and the thread is about to terminate (that closing is important to free handler resources). Or, you can log everything to one file with thread ids or other discriminators included in the log output, and use post-processing on the log file.
I met the same memory leak when using logging.Logger(), and you may try to manually close the handler fd when the logger is useless, like:
for handler in logger.handlers:
handler.close()
精彩评论