Complex multi-dimensional list operations in Scala
Given a list such as the following:
val dane = List(
("2011-01-04", -137.76),
("2011-01-04", 2376.45),
("2011-01-04", -1.70),
("2011-01-04", -1.70),
("2011-01-04", -1.00),
// ... skip a few ...
("2011-12-22", -178.02),
("2011-12-29", 1800.82),
("2011-12-23", -83.97),
("2011-12-24", -200.00),
("2011-12-24", -30.55),
("2011-12-30", 728.00)
)
I'd like to sum the values (i.e. the second item of the inner lists) of a specific month (e.g. January, or 01
), using the following operations in the specified order:
- 开发者_如何学Python
groupBy
slice
collect
sum
I'm feeling contrary, so here's an answer that uses NONE of the prescribed methods: groupBy
, slice
, collect
or sum
Avoiding collect
was the hardest part, condOpt
/flatten
is just so much uglier...
val YMD = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
import PartialFunction._
(dane map {
condOpt(_:(String,Double)){ case (YMD(_,"01",_), v) => v }
}).flatten reduceLeft {_+_}
(for((YearMonthDay(_, 1, _), value)<-dane) yield value).sum
object YearMonthDay{
def unapply(dateString:String):Option((Int, Int, Int)) ={
//yes, there should really be some error checking in this extractor
//to return None for a bad date string
val components = dateString.split("-")
Some((components(0).toInt, components(1).toInt, components(2).toInt))
}
}
Now that Kevin has started the trend of contrary answers, here's one you should never use, but gosh, it works! (And avoids every requested method, and will work on any month if you change the string, but it does require that the list be sorted by date.)
dane.scanLeft(("2011-01",0.0))((l,r) =>
( l._1,
if ((l._1 zip r._1).forall(x => x._1==x._2)) l._2+r._2 else 0.0
)
).dropWhile(_._2==0).takeWhile(_._2 != 0.0).reverse.head._2
Break the problem up into smaller steps. Start with trying to split the list into one list for every month. You could use groupBy
for this. Your first problem will probably be how to parse the date string. A general solution would be to use a custom date class and a regular expression; however a simpler ad-hoc solution of using an indexed substring (or slice
) could be appropriate in this context.
A general tip would be to load the data into the Scala REPL and play around with it. Good luck.
import scala.collection.mutable.HashMap
val totals = new HashMap[Int, Double]
for (e <- dane) {
val (date, value) = e
val month = date.drop(5).take(2).toInt
totals(month) = totals.getOrElse(month,0.0) + value
}
Another implementation using none of the suggested functions, and mutable collections and some bastard mix of procedural and functional style avoiding some useful functions :)
totals
ends up as a map from month number to total.
So, here's an idea:
groupBy
, because you need to group data from each month togetherslice
, because you need to see which is the month of the datecollect
, because you need tofilter
by month andmap
to valuesum
, mmmm... I'm not sure where this one comes in. Any ideas?
I refuse to obfuscate sum
.
import org.joda.time.DateMidnight
for (month <- 1 to 12) yield {
dane map { case (d,v) => new DateMidnight(d).getMonthOfYear -> v }
filter { case (m, v) => m == month }
map (_._2)
sum
}
dane.groupBy (_._1.matches (".*-01-.*")).slice (0, 1).map (x => x._2).flatten .map (y => y._2).sum
I really should look up 'collect', which somehow should replace my map/flatten/map.
My result is: Double = 2234.29
精彩评论