Scala : cleanest way to recursively parse files checking for multiple strings
I want to write a Scala script to recursively process all files in a directory. For each file I'd like to see if there are any cases where a string occurs at line X and line X - 2. If a case like that occurs I'd like to stop processing that file, and add that filename to a map of filenames to occurrence counts. I just started learning Scala today, I've got the file recurse code working, and need some help with the string searching, here's what I have so far:
import java.io.File
import scala.io.Source
val s1= "CmdNum = 506"
val s2 = "Data = [0000,]"
def processFile(f: File) {
val lines = s开发者_开发知识库cala.io.Source.fromFile(f).getLines.toArray
for (i = 0 to lines.length - 1) {
// want to do string searches here, see if line contains s1 and line two lines above also contains s1
//println(lines(i))
}
}
def recursiveListFiles(f: File): Array[File] = {
val these = f.listFiles
if (these != null) {
for (i = 0 to these.length - 1) {
if (these(i).isFile) {
processFile(these(i))
}
}
these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}
else {
Array[File]()
}
}
println(recursiveListFiles(new File(args(0))))
You can do something like this:
def processFile(f: File) {
val src = Source.fromFile(f)
val hit = src.getLines().sliding(3).exists{
case List(l0, l1, l2) => l0.contains(s1) && l2.contains(s1)
case _ => false
}
src.close
// do something depending on hit like adding to a Map
}
First you don't need to convert to an array, you can preserve the iterator to only read the lines necessary to find a match.
You can use sliding
to get a derived iterator using a sliding window of 3 lines where you look for the string on line i
and i+2
.
exists
tests whether an element of this sliding iterator satisfy a predicate. The case
will pattern match the 3 lines from the sliding window element into 3 vals for convenience. I had to use the REPL to find out what type the sliding was really returning.
Finally don't forget to close src.
If you need the occurrence count:
val count = src.getLines().sliding(3).filter{
case List(l0, l1, l2) => l0.contains(s1) && l2.contains(s1)
case _ => false
}.size
You filter the occurrences and then get the size...
edited for match error on files shorter than 3 lines
Here's an alternative way of doing this:
import java.io.File
import scala.io.Source
val s1= "CmdNum = 506"
def filesAt(f: File): Array[File] = if (f.isDirectory) f.listFiles flatMap filesAt else Array(f)
def filterFiles(arr: Array[File]) = arr filter (
Source
fromFile _
getLines ()
sliding 3
exists {
case List(l1, l2, l3) => List(l1, l3) forall (_ contains s1)
case _ => false
}
)
println(filterFiles(filesAt(new File(args(0)))))
Though I'll confess I cheated a bit. I actually had to write this instead of Source fromFile _
:
Source.fromFile(_)(scala.io.Codec.ISO8859)
Because, otherwise, Scala would barf on invalid UTF-8 encodings.
It needs refining to deal with files shorter than 3 lines, but at a first stab I'd try something like this:
def checkFile(file: File) = {
val lines = ...
(lines zip lines.tail.tail) exists { _1 = _2 }
}
Then
val files = ...
val validFiles = files filter { checkFile }
Apologies for being so brief, I'm answering on my mobile...
精彩评论