Caveats to be aware of when using threading in Python?
I'm quite new to threading in Python and have a couple of beginner questions.
When starting more than say fifty threads using the Python threading
module I start getting MemoryError
. The threads themselves are very slim and not very memory hungry, so it seems 开发者_如何学编程like it is the overhead of the threading that causes the memory issues.
- Is there something I can do to increase the memory capacity or otherwise make Python allow for a larger number of threads?
- What is the maximum number of threads you've been able to run in your Python code using the
threading
module? Did you do any tricks to achieve that number? - Are there any other caveats to be aware of when using the
threading
module?
Your question cannot be answered in a general way, as good usage of threading always depends on concrete problem to be solved. You also do not tell us, which Python version you are using, so I assume you use the "default" CPython and not IronPython or something like that. To give you some hints and ideas to further think about your problem:
- Why do you need so much treads? Your machine will probably not be able to run them in parallel anyway.
- Have a look at Stackless Python. Don't know the current status of the project, but I think it was designed for that kind of problems.
- The global interpreter lock prevents pure Python code from really running in parallel. But C methods can be run in parallel, so in real life it's sometimes hard to guess, how Python will behave regarding parallelization.
- Python has many good libraries. Have a look whether one of them already has a solution for your design problem. If your problem is network related, have a look at Twisted for example.
The Global Interpreter Lock is known to have a strong impact on the performance limitations of standard CPython. Thus the multiprocessing module notes:
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
The GIL probably isn't the cause of your MemoryErrors, but it is something to be aware of.
Eventlets-Threads have been designed for low memory consumption. The general purpose call spawn can be easily used to spawn new threads.
精彩评论