In Scala, how can I do the equivalent of an SQL SUM and GROUP BY?

2023-03-29 22:06 问答作者：

For example, suppose I have

val list: List[(String, Double开发者_高级运维)]

with values

"04-03-1985", 1.5
"05-03-1985", 2.4
"05-03-1985", 1.3

How could I produce a new List

"04-03-1985", 1.5
"05-03-1985", 3.7

Here's a one-liner. It's not particularly readable, unless one really internalizes the types of these higher order functions.

val s = Seq(("04-03-1985" -> 1.5),
            ("05-03-1985" -> 2.4),
            ("05-03-1985" -> 1.3))

s.groupBy(_._1).mapValues(_.map(_._2).sum)
// returns: Map(04-03-1985 -> 1.5, 05-03-1985 -> 3.7)

Another approach is to add the key-value pairs one-by-one using fold,

s.foldLeft(Map[String, Double]()) { case (m, (k, v)) =>
  m + (k -> (v + m.getOrElse(k, 0d)))
}

The equivalent for comprehension is most accessible, in my opinion,

var m = Map[String, Double]()
for ((k, v) <- s) {
  m += k -> (v + m.getOrElse(k, 0d))
}

Maybe something nicer can be done with Scalaz's monoid typeclass for Map.

Note that you can convert between Map[K, V] and Seq[(K, V)] using the toSeq and toMap methods.

Update. After pondering it some more, I think the natural abstraction would be a "multimap" conversion, of type,

def seqToMultimap[A, B](s: Seq[(A, B)]): Map[A, Seq[B]]

With the appropriate implicit extension in one's personal library, one could then write:

s.toMultimap.mapValues(_.sum)

This is the clearest of all, in my opinion!

There is another possibility using Scalaz.

The key point is to notice that, if M is a Monoid, then Map[T, M] is also a Monoid. This means that if I have 2 maps, m1 and m2 I can add them so that, for each similar key, the elements will be added together.

For example, Map[String, List[String]] is a Monoid because List[String] is a Monoid. So given the appropriate Monoid instance in scope, I should be able to do:

  val m1 = Map("a" -> List(1), "b" -> List(3))
  val m2 = Map("a" -> List(2))

  // |+| "adds" two elements of a Monoid together in Scalaz
  m1 |+| m2 === Map("a" -> List(1, 2), "b" -> List(3))

For your question we can see that Map[String, Int] is a Monoid because there is a Monoid instance for the Int type. Let's import it:

  implicit val mapMonoid = MapMonoid[String, Int]

Then, I need a function reduceMonoid, which takes anything that's Traversable and "adds" its elements with a Monoid. I just write the reduceMonoid definition here, for the full implementation, please refer to my post on the Essence of the Iterator Pattern:

  // T is a "Traversable"
  def reduce[A, M : Monoid](reducer: A => M): T[A] => M

Those 2 definitions do not exist in the current Scalaz library but they are not difficult to add (based on the existing Monoid and Traverse typeclasses). And once we have them, the solution to your question is very straightforward:

  val s = Seq(("04-03-1985" -> 1.5),
              ("05-03-1985" -> 2.4),
              ("05-03-1985" -> 1.3))

   // we just put each pair in its own map and we let the Monoid instance
   // "add" the maps together
   s.reduceMonoid(Map(_)) === Map("04-03-1985" -> 1.5,
                                  "05-03-1985" -> 3.7)

If you feel that the code above is a bit obscure (but really concise, right?), I encourage you to check the github project for the EIP post and play with it. One example shows the solution to your question:

   "I can build a map String->Int" >> {
     val map1 = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "b" -> 5)
     implicit val mapMonoid = MapMonoid[String, Int]

     map1.reduceMonoid(Map(_)) must_== Map("a" -> 3, "b" -> 8, "c" -> 4)
   }

I used that pattern s.groupBy(_._1).mapValues(_.map(_._2).sum) from Kipton's answer all the time. It translates pretty directly my thought process but unfortunately isn't always easy to read. I've found that using case class whenever possible makes things a bit better:

case class Data(date: String, amount: Double)
val t = s.map(t => (Data.apply _).tupled(t))
// List(Data(04-03-1985,1.5), Data(05-03-1985,2.4), Data(05-03-1985,1.3))

It then becomes:

t.groupBy(_.date).mapValues{ group => group.map(_.amount).sum }
// Map(04-03-1985-> 1.5, 05-03-1985 -> 3.7)

I think it is then more readable than the fold or for version.

val s = List ( "04-03-1985" -> 1.5, "05-03-1985" -> 2.4, "05-03-1985" -> 1.3)
for { (key, xs) <- s.groupBy(_._1)
       x = xs.map(_._2).sum
    } yield (key, x)

Starting Scala 2.13, you can use the groupMapReduce method which is (as its name suggests) an equivalent of a groupBy followed by mapValues and a reduce step:

// val l = List(("04-03-1985", 1.5), ("05-03-1985", 2.4), ("05-03-1985", 1.3))
l.groupMapReduce(_._1)(_._2)(_ + _).toList
// List(("04-03-1985", 1.5), ("05-03-1985", 3.7))

This:

groups tuples by their first part (_._1) (group part of groupMapReduce)
maps each grouped tuples to their second part (_._2) (map part of groupMapReduce)
reduces values within each group (_ + _) by summing them (reduce part of groupMapReduce).

This is a one-pass version of what can be translated by:

l.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _)).toList

继续阅读：scala

In Scala, how can I do the equivalent of an SQL SUM and GROUP BY?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？