Python threading unexpectedly slower

2023-01-04 15:38 问答作者：

I have decided to learn how multi-threading is done in Python, and I did a comparison to see what kind of performance gain I would get on a dual-core CPU. I found that my simple multi-threaded code actuall开发者_StackOverflowy runs slower than the sequential equivalent, and I cant figure out why.

The test I contrived was to generate a large list of random numbers and then print the maximum

from random import random
import threading

def ox():
    print max([random() for x in xrange(20000000)])

ox() takes about 6 seconds to complete on my Intel Core 2 Duo, while ox();ox() takes about 12 seconds.

I then tried calling ox() from two threads to see how fast that would complete.

def go():
    r = threading.Thread(target=ox)
    r.start()
    ox()

go() takes about 18 seconds to complete, with the two results printing within 1 second of eachother. Why should this be slower?

I suspect ox() is being parallelized automatically, because I if look at the Windows task manager performance tab, and call ox() in my python console, both processors jump to about 75% utilization until it completes. Does Python automatically parallelize things like max() when it can?

Python has the GIL. Python bytecode will only be executed by a single processor at a time. Only certain C modules (which don't manage Python state) will be able to run concurrently.
The Python GIL has a huge overhead in locking the state between threads. There are fixes for this in newer versions or in development branches - which at the very least should make multi-threaded CPU bound code as fast as single threaded code.

You need to use a multi-process framework to parallelize with Python. Luckily, the multiprocessing module which ships with Python makes that fairly easy.

Very few languages can auto-parallelize expressions. If that is the functionality you want, I suggest Haskell (Data Parallel Haskell)

The problem is in function random() If you remove random from you code. Both cores try to access to shared state of the random function. Cores work consequentially and spent a lot of time on caches synchronization. Such behavior is known as false sharing. Read this article False Sharing

As Yann correctly pointed out, the Python GIL prevents parallelization from happening in this example. You can either use the python multiprocessing module to fix that or if you are willing to use other open source libraries, Ray is also a great option to get around the GIL problem and is easier to use and has more features than the Python multiprocessing library.

This is how you can parallelize your code example with Ray:

from random import random
import ray

ray.init()

@ray.remote
def ox():
    print(max([random() for x in range(20000000)]))

%time x = ox.remote(); y = ox.remote(); ray.get([x, y])

On my machine, the single threaded ox() code you posted takes 1.84s and the two invocations with ray take 1.87s combined, so we get almost perfect parallelization here.

Ray also makes it very efficient to share data between tasks, on a single machine it will use shared memory under the hood, see https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html.

You can also run the same program across different machines on your cluster or the cloud without having to modify the program, see the documentation (https://ray.readthedocs.io/en/latest/using-ray-on-a-cluster.html and https://ray.readthedocs.io/en/latest/autoscaling.html).

Disclaimer: I'm one of the Ray developers.

继续阅读：multithreading parallel-processing python

Python threading unexpectedly slower

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？