Why does Scala maintain the type of collection not return Iterable (as in .Net)?
In Scala, you can do
val l = List(1, 2, 3)
l.filter(_ > 2) // returns a List[Int]
val s = Set("hello", "world")
s.map(_.length) // returns a Set[Int]
The question is: why is this useful?
Scala collections are probably the only existing collection framework that does this. Scala community seems to agree that this functionality is needed. Yet, noone seems to miss this functionality in the other languages. Example C# (modified naming to match Scala's):
var l = new List<int> { 1, 2, 3 }
l.filter(i => i > 2) // always returns Iterable[Int]
l.filter(i => i > 2).toList // if I want a List, no problem
l.filter(i => i > 2).toSet // or I want a Set
In .NET, I always get back an It开发者_如何学Cerable and it is up to me what I want to do with it. (This also makes .NET collections very simple) .
The Scala example with Set forces me to make a Set of lengths out of a Set of string. But what if I just want to iterate over the lengths, or construct a List of lengths, or keep the Iterable to filter it later. Constructing a Set right away seems pointless. (EDIT: collection.view provides the simpler .NET functionality, nice)
I am sure you will show me examples where the .NET approach is absolutely wrong or kills performance, but I just can't see any (using .NET for years).
Not a full answer to your question, but Scala never forces you to use one collection type over another. You're free to write code like this:
import collection._
import immutable._
val s = Set("hello", "world")
val l: Vector[Int] = s.map(_.length)(breakOut)
Read more about breakOut
in Daniel Sobral's detailed answer to another question.
If you want your map
or filter
to be evaluated lazily, use this:
s.view.map(_.length)
This whole behavior makes it easy to integrate your new collection classes and inherit all the powerful capabilities of the standard collection with no code duplication, all of this ensuring that YourSpecialCollection#filter
returns an instance of YourSpecialCollection
; that YourSpecialCollection#map
returns an instance of YourSpecialCollection
if it supports the type being mapped to, or a built-in fallback collection if it doesn't (like what happens of you call map
on a BitSet
). Surely, a C# iterator has no .toMySpecialCollection
method.
See also: “Integrating new sets and maps” in The Architecture of Scala Collections.
Scala follows the "uniform return type principle" assuring that you always end up with the appropriate return type, instead of loosing that information like in C#.
The reason C# does it this was is that their type system is not good enough to provide these assurances without overriding the whole implementation of every method in every single subclass. Scala solves this with the usage of Higher Kinded Types.
Why Scala has the only collection framework doing this? Because it is harder than most people think it is, especially when things like Strings and Arrays which are no "real" collections should be integrated as well:
// This stays a String:
scala> "Foobar".map(identity)
res27: String = Foobar
// But this falls back to the "nearest" appropriate type:
scala> "Foobar".map(_.toInt)
res29: scala.collection.immutable.IndexedSeq[Int] = Vector(70, 111, 111, 98, 97, 114)
If you have a Set
, and an operation on it returns an Iterable
while its runtime type is still a Set
, then you're losing important informations about its behavior, and the access to set-specific methods.
BTW: There are other languages behaving similar, like Haskell, which influenced Scala a lot. The Haskell version of map
would look like this translated to Scala (without implicit
magic):
//the functor type class
trait Functor[C[_]] {
def fmap[A,B](f: A => B, coll: C[A]) : C[B]
}
//an instance
object ListFunctor extends Functor[List] {
def fmap[A,B](f: A => B, list: List[A]) : List[B] = list.map(f)
}
//usage
val list = ListFunctor.fmap((x:Int) => x*x, List(1,2,3))
And I think the Haskell community values this feature as well :-)
It is a matter of consistency. Things are what they are, and return things like them. You can depend on it.
The difference you make here is one of strictness. A strict method is immediately evaluated, while a non-strict method is only evaluated as needed. This has consequences. Take this simple example:
def print5(it: Iterable[Int]) = {
var flag = true
it.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
Test it with these two collections:
print5(List.range(1, 10))
print5(Stream.range(1, 10))
Here, List
is strict, so its methods are strict. Conversely, Stream
is non-strict, so its methods are non-strict.
So this isn't really related to Iterable
at all -- after all, both List
and Stream
are Iterable
. Changing the collection return type can cause all sort of problems -- at the very least, it would make the task of keeping a persistent data structure harder.
On the other hand, there are advantages to delaying certain operations, even on a strict collection. Here are some ways of doing it:
// Get an iterator explicitly, if it's going to be used only once
def print5I(it: Iterable[Int]) = {
var flag = true
it.iterator.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Get a Stream explicitly, if the result will be reused
def print5S(it: Iterable[Int]) = {
var flag = true
it.toStream.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Use a view, which provides non-strictness for some methods
def print5V(it: Iterable[Int]) = {
var flag = true
it.view.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Use withFilter, which is explicitly designed to be used as a non-strict filter
def print5W(it: Iterable[Int]) = {
var flag = true
it.withFilter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
精彩评论