Why did Scala's library double its size between 2.7 and 2.8?
Comparing Scala 2.7.7 (last 2.7.x release) with Scala 2.8.1 (latest 2.8.x release) I gathered the following metrics:
Scala version | 2.7.7 2.8.1
------------------------------------------------
Compressed jar file | 3.6 MB 6.2 MB
Uncompressed files | 8.3 MB 16.5 MB
.class files in . | 1.8 MB 1.7 MB
in ./actors | 554.0 KB 1.3 MB
in ./annotation | 962 B 11.7 KB
in ./collection | 2.8 MB 8.8 MB
in ./compat | 3.8 3B 3.8 KB
in ./concurrent | 107.3 KB 228.0 KB
in ./io | 175.7 KB 210.6 KB
in ./math | --- 337.5 KB
in ./mobile | 40.8 KB 47.3 KB
in ./ref | 21.8 KB 26.5 KB
in ./reflect | 213.9 KB 940.5 KB
in ./runtime | 271.0 KB 338.9 KB
in ./testing | 47.1 KB 53.0 KB
in ./text | 27.6 KB 34.4 KB
in ./util | 1.6 MB 1.4 MB
in ./xml | 738.9 KB 1.1 MB
The biggest offenders are scala.collection (3.1 times bigger) and scala.reflect (4.4 times bigger). The increase in the collection package is in the same time frame as the big rewrite of the whole collection framework for 2.8, so I guess that's the cause.
I always assumed that the type system magic which computes the best return type of the collection class methods (which was the big change in 2.8) would be done at compile time and won't be visible after that.
- Why did the rewrite result in such a big increase in size?
As far as I know it is planned to improve scala.io, scala.reflect and scala.swing, there are at least two other actor libraries doing the same than scala.actor (Lift actors) or a lot more (Akka) and scala.testing is officially already superseded by third party testing libraries.
Will an improved scala.io, scala.reflect or scala.swing result in a comparable size increase or was the case of scala.collection a really special circumstance?
Is it considered to delegate the actors implementation to Lift or Akka, if there will be an usable 开发者_运维百科modularization system in JDK 8?
Are there plans to finally remove scala.testing or split it from the library jar-file?
Might the inclusion of SAM types, Defender Methods or MethodHandles in JDK7/JDK8 lead to a possibility of reducing the amount of classes the Scala compiler has to generate for anonymous/inner class/singletons/etc.?
Specialization was one factor (about 0.9MB worth of increase in the jar). Another factor are the collection libraries, which now implement a larger set of operations uniformly over a larger set of implementation types. A lot of the increase is only in the bytecodes, because new collection libraries make very heavy use of mixin composition, which tends to increase classfile size. I don't have data on sourcefile size, but I believe the increase there was much smaller.
I'm not in any way associated with the Scala project or any of the companies that support it. So take everything below as my own personal opinion·
- Why did the rewrite result in such a big increase in size?
Most likely, not the rewrite itself, but specialization. In particular, this definition of Function1
:
trait Function1[@specialized(scala.Int, scala.Long, scala.Float, scala.Double) -T1, @specialized(scala.Unit, scala.Boolean, scala.Int, scala.Float, scala.Long, scala.Double) +R]
means all methods in Function1
will be implemented 35 times (one for each of Int
, Long
, Float
, Double
and AnyRef
T1
times each Unit
, Boolean
, Int
, Float
, Long
, Double
and AnyRef
R
.
Now, look at the Scaladoc and see known subclasses for Function1
. I won't even bother copying it here. Also specialized where Function0
and Function2
, though their impact is much smaller.
If anything, I'd bet the rewrite decreased the final footprint, because of the extensive code reuse it enabled.
As for reflect
, it went from being almost non-existent to providing fundamental features to the new collection library, so it is no surprise it had a big relative increase.
- Will an improved scala.io, scala.reflect or scala.swing result in a comparable size increase or was the case of scala.collection a really special circumstance?
Not comparable, because the rewrite had nothing to do with it. However, a true scala.io
library would certainly be much bigger than the little that exists nowadays, and I'd expect the same of a true reflection system for Scala (there have been papers about the latter). As for swing
, I don't think there's much but incremental improvements to it, mostly wrappers around Java libraries, so I doubt it would change much in size.
- Is it considered to delegate the actors implementation to Lift or Akka, if there will be an usable modularization system in JDK 8?
Each implementation have their own strengths, and I haven't seen any signs of convergence for the time being. As for JDK 8, how is Scala supposed to be compatible with JDK 5 while modularizing for JDK 8? I don't mean it is not possible, but it is quite likely too much effort for the available resources.
- Are there plans to finally remove scala.testing or split it from the library jar-file?
It has been discussed, but there's also a concern about having some sort of testing framework available for the compiler itself, with the flexibility a third party testing framework would not provide. It might well be moved (or removed and replaced with something else) to the compiler jar instead, though.
- Might the inclusion of SAM types, Defender Methods or MethodHandles in JDK7/JDK8 lead to a possibility of reducing the amount of classes the Scala compiler has to generate for anonymous/inner class/singletons/etc.?
Sure, once no one else uses JDK5/JDK6 anymore. Of course, if JDK7/JDK8 get widespread adoption and the improvements are sufficiently worthwhile, then there might well come a time when Scala gets distributed with two distinct jar files for its library. But, at this point, it is too early to conjure up hypothetical scenarios.
My guess would be that the increase in size does not come from the rewrite but because of specialization of type parameters that was introduced/enabled in 2.8.
The size has gone up, but is about to go back down quite a bit.
The blog post "Scala Pitfalls: Trait Bloat" posted by Todd Vierling on November 7, 2011, also points out to Scala's multiple inheritance support:
The way that trait implementation code is handled in the Scala compiler — due to restrictions of the JVM — leads to class file bloat, reduced JIT optimization, and increased memory usage.
Note that these problems are not specific to Scala; other methodologies such as AOP mix-in composition suffer from extremely similar problems.
Since Java interfaces are not allowed to contain implementation code*, the Scala compiler performs a few tricks to make it happen.The combination of these farms of stub methods, which can resemble jump tables to those familiar with assembler code, with the extra data added for Scala type signatures caused the collection classes to bloat up significantly between 2.7.x and 2.9.x.
Frankly, the collections inherited from too many traits with too many default implementations, resulting in things like an extra 40kB of uncompressed class file size for every incidence of this sort of code:
... = new Iterator[X] {
def hasNext = ...
def next = ...
}
...or the similar construct "extends Iterator[X]" on a named class.
Yep, the collection classes are incredibly flexible because they are so method-rich, but that same method-rich nature can lead to unexpected compilation side effects.
Todd Vierling points out to those changes in scala/trunk [25959:25964], which allows for reducing the size of scala-library.jar
by about 1.5MB.
That's no small amount for a library that was creeping up on 9MB as of Scala 2.9.x, yet zero functionality was lost.
Fortunately, there's an escape hatch. By deliberately triggering the stub method generation at a known point in the class inheritance chain, it's possible to reduce the number of generated stubs for every specialization needed throughout the library.
This is done with a very simple construct: an abstract class that inherits from the commonly-specialized trait.
精彩评论