Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?
i am working on a project that requires me to create multiple threads to download a large remote file. I have done this already but i cannot understand while it takes a longer amount of time to download a the file with multiple threads compared to using just a single thread. I used my xampp localho开发者_运维问答st to carry out the time elapsed test. I would like to know if its a normal behaviour or is it because i have not tried downloading from a real server.
Thanks Kennedy
9 women can't combine to make a baby in one month. If you have 10 threads, they each have only 10% the bandwidth of a single thread, and there is the additional overhead for context switching, etc.
Python threading use something call the GIL (Golbal Interpreter Lock) that sometime degrade the programs execution time.
Without doing a lot of talk here i invite you to read this and this maybe it can help you to understand your problem, you can also see the two conference here and here.
Hope this can help :)
Twisted uses non-blocking I/O, that means if data is not available on socket right now, doesn't block the entire thread, so you can handle many socket connections waiting for I/O in one thread simultaneous. But if doing something different than I/O (parsing large amounts of data) you still block the thread.
When you're using stdlib's socket module it does blocking I/O, that means when you're call socket.read
and data is not available at the moment — it will block entire thread, so you need one thread per connection to handle concurrent download.
These are two approaches to concurrency:
- Fork new thread for new connection (
threading
+socket
from stdlib). - Multiplex I/O and handle may connections in one thread (
Twisted
).
精彩评论