Find the closest vector

2022-12-31 14:17 问答作者：

Recently I wrote the algorithm to quantize an RGB image. Every pixel is represented by an (R,G,B) vector, and quantization codebook is a couple of 3-dimensional vectors. Every pixel of the image needs to be mapped to (say, "replaced by") the codebook pixel closest in terms of euclidean distance (more exactly, squared euclidean). I did it as follows:

class EuclideanMetric(DistanceMetric):
    def __call__(self, x, y):
        d = x - y
        return sqrt(sum(d * d, -1))

class Quantizer(object):
    def __init__(self, codebook, distanceMetric = EuclideanMetric()):
        self._codebook = codebook
        self._distMetric = distanceMetric

    def quantize(开发者_JAVA技巧self, imageArray):
        quantizedRaster = zeros(imageArray.shape)

        X = quantizedRaster.shape[0]
        Y = quantizedRaster.shape[1]
        for i in xrange(0, X):
            print i
            for j in xrange(0, Y):
                dist = self._distMetric(imageArray[i,j], self._codebook)
                code = argmin(dist)
                quantizedRaster[i,j] = self._codebook[code]

        return quantizedRaster

...and it works awfully, almost 800 seconds on my Pentium Core Duo 2.2 GHz, 4 Gigs of memory and an image of 2600*2700 pixels:(

Is there a way to somewhat optimize this? Maybe the other algorithm or some Python-specific optimizations.

UPD: I tried to use the squared euclidean and still get an enormous time.

One simple optimization is to drop the sqrt call. x is monotonic with sqrt(x), and since you don't need the actual distance, just the min distance, use x^2 instead. Should help a bit since sqrt is expensive.

This trick is used a lot when working with distances. For instance, if you have a distance threshold, you can use threshold^2 and drop the sqrt in the distance calculation. Really, the sqrt is only necessary when absolute distance is needed. For relative distances, drop the sqrt.

Update: an algorithmic change is probably needed then. Right now you are comparing every codebook vector to every pixel. It would speed things up to reduce the number of distance calculations.

You might do better using a kd-tree for this, which will reduce the search for each pixel from O(codebook) to O(log(codebook)). I've never done this in python, but some googling gave an implementation that might work here.

You could use the vector quantization function vq from scipy.cluster.vq.

If X is very large, you're printing i quite a lot, which can really hurt performance. For a less specific answer, read on.

To find out where the bottleneck in your process is, I suggest a timing decorator, something along the lines of

from functools import wraps
import time

def time_this(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        finish = time.time()
        elapsed = (finish - start) * 1000
        print '{0}: {1} ms'.format(func.__name__, elapsed)
        return result
    return wrapper

I found this somewhere once upon a time and have always used it to figure out where my code is slow. You can break your algorithm down into a series of separate functions, then decorate the function with this decorator to see how long each function call takes. Then it's a matter of fiddling with which statements are in which functions to see what improves how long your decorated functions take to run. Mainly you're looking for two things: 1) statements that take a long time to execute, or 2) statements that do not necessarily take long to execute, but that are executed so many times that a very small improvement in performance will have a large effect on the overall performance.

Good luck!

继续阅读：optimization python

Find the closest vector

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？