开发者

Scala : cleanest way to recursively parse files checking for multiple strings

I want to write a Scala script to recursively process all files in a directory. For each file I'd like to see if there are any cases where a string occurs at line X and line X - 2. If a case like that occurs I'd like to stop processing that file, and add that filename to a map of filenames to occurrence counts. I just started learning Scala today, I've got the file recurse code working, and need some help with the string searching, here's what I have so far:


import java.io.File
import scala.io.Source

val s1= "CmdNum = 506"
val s2 = "Data = [0000,]"

def processFile(f: File) {
  val lines = s开发者_开发知识库cala.io.Source.fromFile(f).getLines.toArray
  for (i = 0 to lines.length - 1) {
    // want to do string searches here, see if line contains s1 and line two lines above also contains s1
    //println(lines(i))
  }
}

def recursiveListFiles(f: File): Array[File] = {
  val these = f.listFiles
  if (these != null) {
    for (i = 0 to these.length - 1) {
      if (these(i).isFile) {
        processFile(these(i))
      }
    }
    these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
  }
  else {
    Array[File]()
  }
}

println(recursiveListFiles(new File(args(0))))


You can do something like this:

def processFile(f: File) {
  val src = Source.fromFile(f)
  val hit = src.getLines().sliding(3).exists{ 
    case List(l0, l1, l2) => l0.contains(s1) && l2.contains(s1)
    case _ => false
  }
  src.close
  // do something depending on hit like adding to a Map
}

First you don't need to convert to an array, you can preserve the iterator to only read the lines necessary to find a match.

You can use sliding to get a derived iterator using a sliding window of 3 lines where you look for the string on line i and i+2.

exists tests whether an element of this sliding iterator satisfy a predicate. The case will pattern match the 3 lines from the sliding window element into 3 vals for convenience. I had to use the REPL to find out what type the sliding was really returning.

Finally don't forget to close src.

If you need the occurrence count:

  val count = src.getLines().sliding(3).filter{ 
    case List(l0, l1, l2) => l0.contains(s1) && l2.contains(s1)
    case _ => false
  }.size

You filter the occurrences and then get the size...

edited for match error on files shorter than 3 lines


Here's an alternative way of doing this:

import java.io.File
import scala.io.Source

val s1= "CmdNum = 506"

def filesAt(f: File): Array[File] = if (f.isDirectory) f.listFiles flatMap filesAt else Array(f)

def filterFiles(arr: Array[File]) = arr filter (
    Source
    fromFile _
    getLines ()
    sliding 3
    exists { 
        case List(l1, l2, l3) => List(l1, l3) forall (_ contains s1)
        case _ => false
    }
)

println(filterFiles(filesAt(new File(args(0)))))

Though I'll confess I cheated a bit. I actually had to write this instead of Source fromFile _:

Source.fromFile(_)(scala.io.Codec.ISO8859)

Because, otherwise, Scala would barf on invalid UTF-8 encodings.


It needs refining to deal with files shorter than 3 lines, but at a first stab I'd try something like this:

def checkFile(file: File) = {
  val lines = ...
  (lines zip lines.tail.tail) exists { _1 = _2 }
}

Then

val files = ...
val validFiles = files filter { checkFile }

Apologies for being so brief, I'm answering on my mobile...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜