Multithreaded thumbnail generation in Python

2023-03-01 12:19 问答作者：

I'd like to recurse a directory of images and generate thumbnails for eac开发者_JS百科h image. I have 12 usable cores on my machine. What's a good way to utilize them? I don't have much experience writing multithreaded applications so any simple sample code is appreciated. Thanks in advance.

Abstract

Use processes, not threads, because Python is inefficient with CPU-intensive threads due to the GIL. Two possible solutions for multiprocessing are:

The `multiprocessing` module

This is preferred if you're using an internal thumbnail maker (e.g., PIL). Simply write a thumbnail maker function, and launch 12 in parallel. When one of the processes is finished, run another in its slot.

Adapted from the Python documentation, here's a script should utilize 12 cores:

from multiprocessing import Process
import os

def info(title):  # For learning purpose, remove when you got the PID\PPID idea
    print title
    print 'module:', __name__
    print 'parent process:', os.getppid(), 
    print 'process id:', os.getpid()
 
def f(name):      # Working function
    info('function f')
    print 'hello', name

if __name__ == '__main__':
    info('main line')
    processes=[Process(target=f, args=('bob-%d' % i,)) for i  in range(12)]
    [p.start() for p in processes]
    [p.join()  for p in processes]

Addendum: Using `multiprocess.pool()`

Following soulman's comment, you can use the provided process pull.

I've adapted some code from the multiprocessing manual. Note that you probably should use multiprocessing.cpu_count() instead of 4 to automatically determine the number of CPUs.

from multiprocessing import Pool
import datetime

def f(x):  # You thumbnail maker function, probably using some module like PIL
    print '%-4d: Started at %s' % (x, datetime.datetime.now())
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    print pool.map(f, range(25))          # prints "[0, 1, 4,..., 81]"

Which gives (note that the printouts are not strictly ordered!):

0   : Started at 2011-04-28 17:25:58.992560
1   : Started at 2011-04-28 17:25:58.992749
4   : Started at 2011-04-28 17:25:58.992829
5   : Started at 2011-04-28 17:25:58.992848
2   : Started at 2011-04-28 17:25:58.992741
3   : Started at 2011-04-28 17:25:58.992877
6   : Started at 2011-04-28 17:25:58.992884
7   : Started at 2011-04-28 17:25:58.992902
10  : Started at 2011-04-28 17:25:58.992998
11  : Started at 2011-04-28 17:25:58.993019
12  : Started at 2011-04-28 17:25:58.993056
13  : Started at 2011-04-28 17:25:58.993074
14  : Started at 2011-04-28 17:25:58.993109
15  : Started at 2011-04-28 17:25:58.993127
8   : Started at 2011-04-28 17:25:58.993025
9   : Started at 2011-04-28 17:25:58.993158
16  : Started at 2011-04-28 17:25:58.993161
17  : Started at 2011-04-28 17:25:58.993179
18  : Started at 2011-04-28 17:25:58.993230
20  : Started at 2011-04-28 17:25:58.993233
19  : Started at 2011-04-28 17:25:58.993249
21  : Started at 2011-04-28 17:25:58.993252
22  : Started at 2011-04-28 17:25:58.993288
24  : Started at 2011-04-28 17:25:58.993297
23  : Started at 2011-04-28 17:25:58.993307
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 
 289, 324, 361, 400, 441, 484, 529, 576]

The `subprocess` module

The subprocess module is useful for running external processes, and thus preferred if you plan on using an external thumbnail maker like imagemagick's convert. Code example:

import subprocess as sp

processes=[sp.Popen('your-command-here', shell=True, 
                    stdout=sp.PIPE, stderr=sp.PIPE) for i in range(12)]

Now, iterate over processes. If any process has finished (using subprocess.poll()), remove it and add a new process to your list.

Like others have answered, subprocesses is usually preferable to threads. multiprocessing.Pool makes it easy to use exactly as many subprocesses as you want, for instance like this:

import os
from multiprocessing import Pool

def process_file(filepath):
    [if filepath is an image file, resize it]

def enumerate_files(folder):
    for dirpath, dirnames, filenames in os.walk(folder):
       for fname in filenames:
           yield os.path.join(dirpath, fname)

if __name__ == '__main__':
    pool = Pool(12) # or omit the parameter to use CPU count
    # use pool.map() only for the side effects, ignore the return value
    pool.map(process_file, enumerate_files('.'), chunksize=1)

The chunksize=1 parameter makes sense if each file operation is relatively slow compared to communicating with each subprocess.

Don't go with threads, they are too complicated for what you want. Instead, use the subprocess library to spawn separate processes working through each directory.

So you will have a primary program that generates a list of files, then starts popping each file off the list and feeding it into a subprocess. The subprocess would be a simple python program to generate a thumbnail from an input image. Some simple logic to keep your spawned processes within a limited set, say 11, would keep you from forkbombing your machine.

This allows the os to handle all of those niggling details of who runs where and so on.

继续阅读：image-processing multithreading python

Multithreaded thumbnail generation in Python

Abstract

The `multiprocessing` module

Addendum: Using `multiprocess.pool()`

The `subprocess` module

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Abstract

The multiprocessing module

Addendum: Using multiprocess.pool()

The subprocess module

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

The `multiprocessing` module

Addendum: Using `multiprocess.pool()`

The `subprocess` module

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？