开发者

Filter List in Functional Programming Style

We have a list of Strings with B开发者_JAVA百科EGIN and END marks as parts of this list. Can we filter out elements in-between of BEGIN-END in functional programming style? I came out only with this regular (flag) approach in scala.

val list1 =
  """992
  1010
  1005
  1112
  BEGIN
  1086
  1244
  1107
  1121
  END
  1223
  1312
  1319
  1306
  1469""".lines.toList

var flag = false
val filteredList = list1.filter{
  def f(x: String): Boolean = {
    if (x.contains("BEGIN")) {
      flag = true;
      return false
    } else if (x.contains("END")) {
      flag = false
    }
    flag
  }
  f
}

Is that possible to avoid defining the flag variable? How do they solve this in pure functional languages?


You can use drop/tail, dropWhile, takeWhile functions:

val filteredList = list1.map(_.trim).dropWhile("BEGIN" !=).tail.takeWhile("END" !=)

EDIT

As mentioned in comments tail will throw exception if list is empty, so if you prefer to stay on the safe side use drop(1) instead of tail:

val filteredList = list1.map(_.trim).dropWhile("BEGIN" !=).drop(1).takeWhile("END" !=)

And here is my version of algorithm that handles several BEGIN and END sections (some crazy stuff from me - a little state machine :)

var filteredList1 = list1.map(_.trim).foldLeft(List(None): List[Option[List[String]]]) {
  case (None :: rest, "BEGIN") => Some(Nil) :: rest
  case (Some(list) :: rest, "END") => None :: Some(list) :: rest
  case (Some(current) :: rest, num) => Some(num :: current) :: rest
  case (result, _) => result
}.flatten.reverse map (_.reverse)

it returns List[List[String]]


To start with, each string in your list contains the spaces from the start of the line.

This is the biggest problem in your code, and there's two ways to fix it.

Either trim the lines...

val list1 =
  """992
  1010
  ...
  1306
  1469""".lines.map(_.trim).toList

... or you can precede each line with a | and use stripMargin.

Then it's just a small matter of applying takeWhile/dropWhile

list1.takeWhile("BEGIN" !=) ++ list1.dropWhile("END"!=).tail

or more efficiently:

val (begin,middle) = list1.span("BEGIN" !=)
val end = middle.dropWhile("END" !=).tail
begin ++ end

EDIT

I had the solution back to front, that would drop (filter out) values between BEGIN and END. To retain them:

list1.dropWhile("BEGIN" !=).tail.takeWhile("END"!=)

EDIT 2

Rising to the challenge here... I'll allow for multiple BEGIN/END blocks, but also consider that the input might be badly malformed. What if there was a BEGIN without a corresponding END? Perhaps there are two BEGINs in a row, or the list runs out before there's an END.

Defining some rules:

  • An END without a corresponding BEGIN is ignored
  • BEGIN/END blocks don't nest
  • A BEGIN encountered while already in a block starts a new block
  • If the list runs out while in a block, then an implicit END is assumed

Without further ado, first create an iterator that identifies every "BEGIN" in the input:

val blocksStarts =
  Iterator.iterate(list1)(_.dropWhile("BEGIN" !=).drop(1)).drop(1).takeWhile(Nil !=)

//This iterator tries to continue forever,
//returning Nils once the sequences are exhausted
//For this reason, we must use drop(1) instead of tail

Giving an iterator of lists, each starting at a "BEGIN"

To then take elements from each of these lists until the corresponding "END" is reached, or another "BEGIN", or the list is exhausted:

val blocks = blockStarts map {
  _.takeWhile(x => x != "BEGIN" && x != "END")
} toList

The final toList is because it's still an Iterator at that point. You now have a List of Lists, each corresponding to a batch of elements in a "Block", as defined by the previous rules.


I'm extending others' answers a little to present a case where there are two BEGIN...END blocks in the list.

val list1 =
  """992
  1010
  1005
  1112
  BEGIN
  1086
  1244
  1107
  1121
  END
  1223
  1312
  BEGIN
  773
  990
  224
  END
  1319
  1306
  1469""".lines.map(_.trim).toList

We're going to use foldRight to pass a status accumulator between iterations. Note that we're using foldRight to make constructing the result list efficient, so we're going to encounter END before we encounter BEGIN.

case class StripStatus(list:List[String], retaincurrent:Boolean)

list1.foldRight(StripStatus(Nil,false)){ (curElem:String, curStatus:StripStatus) =>
   if (curElem == "END")
      StripStatus(curStatus.list,true)
   else if (curElem == "BEGIN")
      StripStatus(curStatus.list,false)
   else if (curStatus.retaincurrent)
      StripStatus(curElem::curStatus.list, true)
   else
      curStatus
}.list

We could just as easily use foldLeft and reverse the result list at the end:

list1.foldLeft(StripStatus(Nil,false)){ (curStatus:StripStatus, curElem:String) =>
   if (curElem == "BEGIN")
      StripStatus(curStatus.list,true)
   else if (curElem == "END")
      StripStatus(curStatus.list,false)
   else if (curStatus.retaincurrent)
      StripStatus(curElem::curStatus.list, true)
   else
      curStatus
}.list.reverse


MMmm. Here's my take:

def getInside(l: List[String]) = {
    def concat(in: List[String], out: List[String]): List[String] = in ::: off(out)

    def off(l: List[String]): List[String] = 
        if (l.isEmpty) Nil 
        else on(l dropWhile ("BEGIN" !=) drop 1)

    def on(l: List[String]): List[String] = 
        if (l.isEmpty) Nil
        else (concat _).tupled(l span ("END" !=))

    off(l)
}


I don't know Scala, but you can define a function that returns the index in the list of the next element that matches a substring, and returns the index where the substring was found and also the list of elements encountered until that substring was matched. A pseudocode header: findSubstr(list, startIndex). Then build the expression (more pseudocode):

beginIndex, preBeginElems = findSubstr(list, 0)
endIndex, inBetweenElems = findSubstr(list, beginIndex)
restElems = list[endIndex until the end]

If helpful, I could write this in Haskell... :)

EDIT: There are probably other ways to do it, too


Again, with the same goal of dealing with multiple BEGIN...END spans in the list.

def getBetweenBeginEnd(l:List[String]) = {
   def internal(l:List[String],accum:List[String]):List[String]={
      val (keep, keepChecking) = l.dropWhile("BEGIN" !=).drop(1).span("END" !=)
      if (keepChecking == Nil)
         accum:::keep
      else
         internal(keepChecking.tail,accum:::keep)
   }
   internal(l,Nil)
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜