The cost of nested methods
In Scala one might define methods inside other methods. This limits their scope of use to inside开发者_如何学编程 of definition block. I use them to improve readability of code that uses several higher-order functions. In contrast to anonymous function literals, this allows me to give them meaningful names before passing them on.
For example:
class AggregatedPerson extends HashSet[PersonRecord] {
def mostFrequentName: String = {
type NameCount = (String, Int)
def moreFirst(a: NameCount, b: NameCount) = a._2 > b._2
def countOccurrences(nameGroup: (String, List[PersonRecord])) =
(nameGroup._1, nameGroup._2.size)
iterator.toList.groupBy(_.fullName).
map(countOccurrences).iterator.toList.
sortWith(moreFirst).head._1
}
}
Is there any runtime cost because of the nested method definition I should be aware of?
Does the answer differ for closures?
During compilaton, the nested functions are moveFirst
and countOccurences
are moved out to the same level as mostFrequentName
. They get compiler synthesized names: moveFirst$1
and countOccurences$1
.
In addition, when you refer to one of these methods without an argument list, it is lifted into a function. So map(countOccurences)
is the same as writing map((a: (String, List[PersonRecord])) => countOccurences(a))
. This anonymous function is compiled to a separate class AggregatedPerson$$anonfun$mostFrequentName$2
, which does nothing more than forward to countOccurences$
.
As a side note, the process of lifting the method to a function is called Eta Expansion. It is triggered if you omit the argument list in a context where a function type is expected (as in your example), or if you use _
in place of the entire argument list, or in place of each argument (val f1 = countOccurences _ ; val f2 = countOccurences(_)
.
If the code was directly in the closure, you would have one fewer method call in your stack, and one fewer synthetic method generated. The performance impact of this is likely to be zero in most cases.
I find it to be a fantastically useful tool to structure code, and consider your example very idiomatic Scala.
Another useful tool is using small blocks to initialize a val:
val a = {
val temp1, temp2 = ...
f(temp1, temp2)
}
You can use scalac -print
to see exactly how Scala code is translated into a form ready for the JVM. Heres the output from your program:
[[syntax trees at end of cleanup]]// Scala source: nested-method.scala
package <empty> {
class AggregatedPerson extends scala.collection.mutable.HashSet with ScalaObject {
def mostFrequentName(): java.lang.String = AggregatedPerson.this.iterator().toList().groupBy({
(new AggregatedPerson$$anonfun$mostFrequentName$1(AggregatedPerson.this): Function1)
}).map({
{
(new AggregatedPerson$$anonfun$mostFrequentName$2(AggregatedPerson.this): Function1)
}
}, collection.this.Map.canBuildFrom()).$asInstanceOf[scala.collection.MapLike]().iterator().toList().sortWith({
{
(new AggregatedPerson$$anonfun$mostFrequentName$3(AggregatedPerson.this): Function2)
}
}).$asInstanceOf[scala.collection.IterableLike]().head().$asInstanceOf[Tuple2]()._1().$asInstanceOf[java.lang.String]();
final def moreFirst$1(a: Tuple2, b: Tuple2): Boolean = scala.Int.unbox(a._2()).>(scala.Int.unbox(b._2()));
final def countOccurrences$1(nameGroup: Tuple2): Tuple2 = new Tuple2(nameGroup._1(), scala.Int.box(nameGroup._2().$asInstanceOf[scala.collection.SeqLike]().size()));
def this(): AggregatedPerson = {
AggregatedPerson.super.this();
()
}
};
@SerialVersionUID(0) @serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$1 extends scala.runtime.AbstractFunction1 {
final def apply(x$1: PersonRecord): java.lang.String = x$1.fullName();
final <bridge> def apply(v1: java.lang.Object): java.lang.Object = AggregatedPerson$$anonfun$mostFrequentName$1.this.apply(v1.$asInstanceOf[PersonRecord]());
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$1 = {
AggregatedPerson$$anonfun$mostFrequentName$1.super.this();
()
}
};
@SerialVersionUID(0) @serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$2 extends scala.runtime.AbstractFunction1 {
final def apply(nameGroup: Tuple2): Tuple2 = AggregatedPerson$$anonfun$mostFrequentName$2.this.$outer.countOccurrences$1(nameGroup);
<synthetic> <paramaccessor> private[this] val $outer: AggregatedPerson = _;
final <bridge> def apply(v1: java.lang.Object): java.lang.Object = AggregatedPerson$$anonfun$mostFrequentName$2.this.apply(v1.$asInstanceOf[Tuple2]());
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$2 = {
if ($outer.eq(null))
throw new java.lang.NullPointerException()
else
AggregatedPerson$$anonfun$mostFrequentName$2.this.$outer = $outer;
AggregatedPerson$$anonfun$mostFrequentName$2.super.this();
()
}
};
@SerialVersionUID(0) @serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$3 extends scala.runtime.AbstractFunction2 {
final def apply(a: Tuple2, b: Tuple2): Boolean = AggregatedPerson$$anonfun$mostFrequentName$3.this.$outer.moreFirst$1(a, b);
<synthetic> <paramaccessor> private[this] val $outer: AggregatedPerson = _;
final <bridge> def apply(v1: java.lang.Object, v2: java.lang.Object): java.lang.Object = scala.Boolean.box(AggregatedPerson$$anonfun$mostFrequentName$3.this.apply(v1.$asInstanceOf[Tuple2](), v2.$asInstanceOf[Tuple2]()));
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$3 = {
if ($outer.eq(null))
throw new java.lang.NullPointerException()
else
AggregatedPerson$$anonfun$mostFrequentName$3.this.$outer = $outer;
AggregatedPerson$$anonfun$mostFrequentName$3.super.this();
()
}
}
}
There is a small runtime cost. You can observe it here (apologies for the long code):
object NestBench {
def countRaw() = {
var sum = 0
var i = 0
while (i<1000) {
sum += i
i += 1
var j = 0
while (j<1000) {
sum += j
j += 1
var k = 0
while (k<1000) {
sum += k
k += 1
sum += 1
}
}
}
sum
}
def countClosure() = {
var sum = 0
var i = 0
def sumI {
sum += i
i += 1
var j = 0
def sumJ {
sum += j
j += 1
var k = 0
def sumK {
def sumL { sum += 1 }
sum += k
k += 1
sumL
}
while (k<1000) sumK
}
while (j<1000) sumJ
}
while (i<1000) sumI
sum
}
def countInner() = {
var sum = 0
def whileI = {
def whileJ = {
def whileK = {
def whileL() = 1
var ksum = 0
var k = 0
while (k<1000) { ksum += k; k += 1; ksum += whileL }
ksum
}
var jsum = 0
var j = 0
while (j<1000) {
jsum += j; j += 1
jsum += whileK
}
jsum
}
var isum = 0
var i = 0
while (i<1000) {
isum += i; i += 1
isum += whileJ
}
isum
}
whileI
}
def countFunc() = {
def summer(f: => Int)() = {
var sum = 0
var i = 0
while (i<1000) {
sum += i; i += 1
sum += f
}
sum
}
summer( summer( summer(1) ) )()
}
def nsPerIteration(f:() => Int): (Int,Double) = {
val t0 = System.nanoTime
val result = f()
val t1 = System.nanoTime
(result , (t1-t0)*1e-9)
}
def main(args: Array[String]) {
for (i <- 1 to 5) {
val fns = List(countRaw _, countClosure _, countInner _, countFunc _)
val labels = List("raw","closure","inner","func")
val results = (fns zip labels) foreach (fl => {
val x = nsPerIteration( fl._1 )
printf("Method %8s produced %d; time/it = %.3f ns\n",fl._2,x._1,x._2)
})
}
}
}
There are four different methods for summing integers:
- A raw while loop ("raw")
- While loop inner methods that are closures over the sum variable
- While loop inner methods that return the partial sum
- A while-and-sum nested function
And we see the results on my machine in terms of nanoseconds taken in the inner loop:
scala> NestBench.main(Array[String]())
Method raw produced -1511174132; time/it = 0.422 ns
Method closure produced -1511174132; time/it = 2.376 ns
Method inner produced -1511174132; time/it = 0.402 ns
Method func produced -1511174132; time/it = 0.836 ns
Method raw produced -1511174132; time/it = 0.418 ns
Method closure produced -1511174132; time/it = 2.410 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.813 ns
Method raw produced -1511174132; time/it = 0.411 ns
Method closure produced -1511174132; time/it = 2.372 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.813 ns
Method raw produced -1511174132; time/it = 0.411 ns
Method closure produced -1511174132; time/it = 2.370 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.815 ns
Method raw produced -1511174132; time/it = 0.412 ns
Method closure produced -1511174132; time/it = 2.357 ns
Method inner produced -1511174132; time/it = 0.400 ns
Method func produced -1511174132; time/it = 0.817 ns
So, bottom line is: nesting functions really doesn't hurt you at all in simple cases--the JVM will figure out that the call can be inlined (thus raw
and inner
give the same times). If you take a more functional approach, the function call can't be completely neglected, but the time taken is vanishingly small (approximately 0.4 ns extra per call). If you use a lot of closures, then closing them gives an overhead of something like 1 ns per call at least in this case where a single mutable variable is written to.
You can modify the code above to find answers to other questions, but the bottom line is that it is all very fast, ranging between "no penalty whatsoever" through to "only worry about in the very tightest inner loops that otherwise have minimal work to do".
(P.S. For comparison, the creation of a single small object takes ~4 ns on my machine.)
Current as of January 2014
The current benchmark is ~3 years old and Hotspot and the compiler have significantly evolved. I am also using Google Caliper to perform the benchmarks.
Code
import com.google.caliper.SimpleBenchmark
class Benchmark extends SimpleBenchmark {
def timeRaw(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += 0xc37e ^ (i * 0xd5f3)
i = i + 1
}
result
}
def normal(i: Int): Long = 0xc37e ^ (i * 0xd5f3)
def timeNormal(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += normal(i)
i = i + 1
}
result
}
def timeInner(reps: Int) = {
def inner(i: Int): Long = 0xc37e ^ (i * 0xd5f3)
var i = 0
var result = 0L
while (i < reps) {
result += inner(i)
i = i + 1
}
result
}
def timeClosure(reps: Int) = {
var i = 0
var result = 0L
val closure = () => result += 0xc37e ^ (i * 0xd5f3)
while (i < reps) {
closure()
i = i + 1
}
result
}
def normal(i: Int, j: Int, k: Int, l: Int): Long = i ^ j ^ k ^ l
def timeUnboxed(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += normal(i,i,i,i)
i = i + 1
}
result
}
val closure = (i: Int, j: Int, k: Int, l: Int) => (i ^ j ^ k ^ l).toLong
def timeBoxed(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
closure(i,i,i,i)
i = i + 1
}
result
}
}
Results
benchmark ns linear runtime
Normal 0.576 =
Raw 0.576 =
Inner 0.576 =
Closure 0.532 =
Unboxed 0.893 =
Boxed 15.210 ==============================
What is very surprising is that closure test was completed about 4ns faster than the others. This seems to be an idiosyncrasy of Hotspot instead of the execution environment, multiple runs have returned the same trend.
Using a closure that performs boxing is a huge performance hit, it takes approx 3.579ns to perform one unboxing and reboxing, enough to do a lot of primitive math operations. In this specific position, things might get better with the work being done on a new optimizer. In the general case, boxing might be alleviated by miniboxing.
Edit:
The new optimizer doesn't really help here, it makes Closure
0.1 ns slower and Boxed
0.1 ns faster:
benchmark ns linear runtime
Raw 0.574 =
Normal 0.576 =
Inner 0.575 =
Closure 0.645 =
Unboxed 0.889 =
Boxed 15.107 ==============================
Performed with 2.11.0-20131209-220002-9587726041 from magarciaEPFL/scala
Versions
JVM
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Scalac
Scala compiler version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
精彩评论