开发者

Why is the proxy pattern so slow?

At least in java, the proxy pa开发者_如何学运维ttern has a lot of overhead - I don't remember the exact figures, but when wrapping tiny methods the proxy takes something like 50 times as long as the wrapped method. This is, for example, why java.awt.image.BufferedImage.setRGB&getRGB are really slow; there's about three proxies wrapping the actual byte[].

Why 50 times?! Why doesn't the proxy just double the time?


Edit: =(

As seems usual for SO, I got a bunch of answers telling me that my question was wrong. It's not. Check out BufferedImage, or some other real proxy pattern, not those microbenchmarks. In fact, if you have to do a lot of pixel manipulation of a BufferedImage and you know its structure, you can achieve said enormous speedups by manually undoing the proxying; see this answer.

Oh, and here's my source for 50x. As the article details, proxies don't have a noticeable penalty when what they wrap takes a long time, but they do have major painful overhead if you're wrapping a tiny method.


I don't know where that "50 times" figure comes from, but it's pretty suspect. It may be that a specific proxy is markedly slower than what it's proxying, depending on what each of them is doing, but to generalize from that to say that "the proxy pattern is so slow" is to take a very dramatic and highly-questionable leap in logic.

Try this:

Thingy.java:

public class Thingy
{
    public int foo(int param1, int param2)
    {
        return param2 - param1;
    }
}

ThingyProxy.java:

public class ThingyProxy
{
    Thingy thingy;

    public ThingyProxy()
    {
        this.thingy = new Thingy();
    }

    public int foo(int param1, int param2)
    {
        return this.thingy.foo(param1, param2);
    }
}

WithoutProxy.java:

public class WithoutProxy
{
    public static final void main(String[] args)
    {
        Thingy t;
        int sum;
        int counter;
        int loops;

        sum = 0;
        t = new Thingy();
        for (loops = 0; loops < 300000000; ++loops) {
            sum = 0;
            for (counter = 0; counter < 100000000; ++counter) {
                sum += t.foo(1, 2);
            }
            if (sum != 100000000) {
                System.out.println("ERROR");
                return;
            }
        }
        System.exit(0);
    }
}

WithProxy.java:

public class WithProxy
{
    public static final void main(String[] args)
    {
        ThingyProxy t;
        int sum;
        int counter;
        int loops;

        sum = 0;
        t = new ThingyProxy();
        for (loops = 0; loops < 300000000; ++loops) {
            sum = 0;
            for (counter = 0; counter < 100000000; ++counter) {
                sum += t.foo(1, 2);
            }
            if (sum != 100000000) {
                System.out.println("ERROR");
                return;
            }
        }
        System.exit(0);
    }
}

Simple trials on my machine:

$ time java WithoutProxy 

real    0m0.894s
user    0m0.900s
sys     0m0.000s

$ time java WithProxy

real    0m0.934s
user    0m0.940s
sys     0m0.000s

$ time java WithoutProxy 

real    0m0.883s
user    0m0.850s
sys     0m0.040s

$ time java WithProxy

real    0m0.937s
user    0m0.920s
sys     0m0.030s

$ time java WithoutProxy 

real    0m0.898s
user    0m0.880s
sys     0m0.030s

$ time java WithProxy

real    0m0.936s
user    0m0.950s
sys     0m0.000s

Slightly slower? Yes. 50x slower? No.

Now, timing the JVM is notoriously difficult and simple experiments like the above are necessarily suspect. But I think a 50x difference probably would have shown up.

Edit: I should have mentioned that the above with a very, very small number of loops posts numbers like this:

real    0m0.058s
user    0m0.040s
sys     0m0.020s

...which gives you an idea of VM startup time in the environment. E.g., the timings above are not mostly VM startup with just a microsecond of difference in actual execution time, they're mostly execution time.


When the code has been compiled into native code, the byte array accesses would be something like 3 1 cycle instructions (as long as the source and destination data are hot in the cache and unaligned byte accesses are not penalized. YMMV depending on platform).

Adding a method call to store the four bytes will (depending on platform, but something like this) add pushing registers to the stack, a call instruction, the array access instructions, a return instruction and popping the registers from the stack. The push/call/return/pop sequence will be added for each layer or proxy, and none of these instructions mostly don't execute in 1 cycle. If the compiler fails to inline these methods (which can happen rather easily) you'd run into a quite hefty penalty.

The proxies add functionality to convert between color depths and so on, adding extra overhead.

Also, sequential array accesses can be further optimized by the compiler (eg. turning the store operations into multiple byte access operations - up to 8 bits at a time while still taking only 1 cycle) where the proxy calls make that hard.

50x sounds a bit high, but not unreasonably so depending on the actual code.

BufferedImage in particular adds plenty of overhead. While the proxy pattern in itself might add no discernible overhead, usage of BufferedImage probably does. Note in particular that setRGB() is synchronized, which might have severe performance implications in certain circumstances.


One place I have seen them make a difference is on code which doesn't do anything. The JVM can detect code which doesn't do anything can eliminate it. However, using methods calls can confuse this check and the code is not eliminated. If you compare the timing with and with out methods in such examples, you can get any ratio you wish, however if you look at how the no-methods test is going, you will see that the code has been eliminated and is going unreasonably fast. e.g. much faster than one clock cycle per loop.


Trivial methods are inlined, like getter and setters. They can result in no impact on performance at all. I very much doubt the 50x times claim for a real program. I would expect closer to no-difference-what-so-ever when tested properly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜