Do abstract classes in Scala really perform better than traits?
An excerpt from the stairway book:
If efficiency is very important, lean towards using a class. Most Java runtimes make a virtual method invocation of a class member a faster operation than an interface method invocation. Traits get compiled to interfaces and therefore may pay a slight performance overhead. However, you should make this choice only if you know th开发者_StackOverflow社区at the trait in question constitutes a performance bottleneck and have evidence that using a class instead actually solves the problem.
I wrote some simple code to see what really happens behind the scenes. And I did notice invokevirtual
being used in case of an abstract class and invokeinterface
in case of an interface.
But no matter what kind of code I wrote they always roughly performed the same. I use HotSpot 1.6.0_18 in server mode.
Is it JIT doing such a great job optimizing?
Does anybody have a sample code which proves the claim from the book about invokevirutal
being the faster operation?
If HotSpot notices that all instances at the call site are of the same type, it is able to use a monomorphic method call and both virtual and interface methods are optimized the same way. The documents PerformanceTechniques and VirtualCalls make no distinction between virtual and interface methods.
But in the general non-monomorphic case there might be some difference. The InterfaceCalls document says:
There is no simple prefixing scheme in which an interface's methods are displayed at fixed offsets within every class that implements that interface. Instead, in the general (non-monomorphic) case, an assembly-coded stub routine must fetch a list of implemented interfaces from the receiver's klassOop, and walk that list seeking the current target interface.
It also confirms that the monomorphic case is the same for both:
Nearly the same optimizations apply to interface calls as to virtual calls. As with virtual calls, most interface calls are monomorphic, and can therefore be rendered as direct calls with a cheap check.
Other JVMs might have different optimizations.
You could try a micro benchmark (if you know how) which calls methods on multiple classes which implement the same interface, and on multiple classes which extend the same abstract class. That way it should be possible to force the JVM to use non-monomorphic method calls. (Though in the real life any difference there might not matter, since most calls sites are anyways monomorphic.)
The bottom line is that you're going to have to measure it yourself for your own application to see if it's important. You can get rather counterintuitive results with the current JVM. Try this out.
File TraitAbstractPackage.scala
package traitvsabstract
trait T1 { def x: Int; def inc: Unit }
trait T2 extends T1 { def x_=(x0: Int): Unit }
trait T3 extends T2 { def inc { x = x + 1 } }
abstract class C1 { def x: Int; def inc: Unit }
abstract class C2 extends C1 { def x_=(x0: Int): Unit }
abstract class C3 extends C2 { def inc { x = x + 1 } }
File TraitVsAbstract.scala
object TraitVsAbstract {
import traitvsabstract._
class Ta extends T3 { var x: Int = 0}
class Tb extends T3 {
private[this] var y: Long = 0
def x = y.toInt
def x_=(x0: Int) { y = x0 }
}
class Tc extends T3 {
private[this] var xHidden: Int = 0
def x = xHidden
def x_=(x0: Int) { if (x0 > xHidden) xHidden = x0 }
}
class Ca extends C3 { var x: Int = 0 }
class Cb extends C3 {
private[this] var y: Long = 0
def x = y.toInt
def x_=(x0: Int) { y = x0 }
}
class Cc extends C3 {
private[this] var xHidden: Int = 0
def x = xHidden
def x_=(x0: Int) { if (x0 > xHidden) xHidden = x0 }
}
def Tbillion3(t: T3) = {
var i=0; while (i<1000000000) { t.inc; i+=1 }; t.x
}
def Tbillion1(t: T1) = {
var i=0; while (i<1000000000) { t.inc; i+=1 }; t.x
}
def Cbillion3(c: C3) = {
var i=0; while (i<1000000000) { c.inc; i+=1 }; c.x
}
def Cbillion1(c: C1) = {
var i=0; while (i<1000000000) { c.inc; i+=1 }; c.x
}
def ptime(f: => Int) {
val t0 = System.nanoTime
val ans = f.toString
val t1 = System.nanoTime
printf("Answer: %s; elapsed: %.2f seconds\n",ans,(t1-t0)*1e-9)
}
def main(args: Array[String]) {
for (i <- 1 to 3) {
println("Iteration "+i)
val t1s,t3s = List(new Ta, new Tb, new Tc)
val c1s,c3s = List(new Ca, new Cb, new Cc)
t1s.foreach(x => ptime(Tbillion1(x)))
t3s.foreach(x => ptime(Tbillion3(x)))
c1s.foreach(x => ptime(Cbillion1(x)))
c3s.foreach(x => ptime(Cbillion3(x)))
println
}
}
}
Every one should print out 1000000000 as the answer, and the time taken should be zero (if the JVM is really clever) or about as long as it takes to add a billion numbers. But at least on my system, the Sun JVM optimizes backwards--repeated runs get slower--and abstract classes are slower than traits. (You might want to run with java -XX:+PrintCompilation
to try to figure out what goes wrong; I suspect zombies.)
Also, it's worth noting that scalac -optimise does nothing to improve matters--it's all up to the JVM.
The JRockit JVM in contrast turns in a consistent middling performance, but again, traits beat classes. Since the timings are consistent I'll report them: 3.35s for the classes (3.62s for the one with an if statement) vs. 2.51 seconds for all the traits, if-statement or no.
(I find this trend to be generally true: Hotspot produces blazing fast performance in some cases, and in others (like this case) gets confused and is woefully slow; JRockit is never super-fast--don't bother trying to get C-like performance even out of primitives--but it rarely blunders.)
A quote from Inside the Java Virtual Machine (Invocation Instructions and Speed):
When the Java Virtual Machine encounters an invokevirtual instruction and resolves the symbolic reference to a direct reference to an instance method, that direct reference is likely an offset into a method table. From that point forward, the same offset can be used. For an invokeinterface instruction, however, the virtual machine will have to search through the method table every single time the instruction is encountered, because it can't assume the offset is the same as the previous time.
精彩评论