Functional approach to parse hierarchical CSV
I'm trying to create a piece of code but cannot get it working. The simplest example I can think of is parsing some CSV file. Suppose we have a CVS file, but the data is organized in so开发者_如何学Cme kind of hierarchy in it. Like this:
Section1;
;Section1.1
;Section1.2
;Section1.3
Section2;
;Section2.1
;Section2.2
;Section2.3
;Section2.4
etc.
I did this:
let input =
"a;
;a1
;a2
;a3
b;
;b1
;b2
;b3
;b4
;b5
c;
;c1"
let lines = input.Split('\n')
let data = lines |> Array.map (fun l -> l.Split(';'))
let sections =
data
|> Array.mapi (fun i l -> (i, l.[0]))
|> Array.filter (fun (i, s) -> s <> "")
and I got
val sections : (int * string) [] = [|(0, "a"); (4, "b"); (10, "c")|]
Now I'd like to create a list of line index ranges for each section, something like this:
[|(1, 3, "a"); (5, 9, "b"); (11, 11, "c")|]
with the first number being a starting line index of the subsection range and the second - the ending line index. How do I do that? I was thinking about using fold function, but couldn't create anything.
As far as I know, there is no easy way to do this, but it is definitely a good way to practice functional programming skills. If you used some hierarchical representation of data (e.g. XML or JSON), the situation would be a lot easier, because you wouldn't have to transform the data structure from linear (e.g. list/array) to hierarchical (in this case, a list of lists).
Anyway, a good way to approach the problem is to realize that you need to do some more general operation with the data - you need to group adjacent elements of the array, starting a new group when you find an line with a value in the first column.
I'll start by adding a line number to the array and then convert it to list (which is usually easier to work with in F#):
let data = lines |> Array.mapi (fun i l ->
i, l.Split(';')) |> List.ofSeq
Now, we can write a reusable function that groups adjacent elements of a list and starts a new group each time the specified predicate f
returns true
:
let adjacentGroups f list =
// Utility function that accumulates the elements of the current
// group in 'current' and stores all groups in 'all'. The parameter
// 'list' is the remainder of the list to be processed
let rec adjacentGroupsUtil current all list =
match list with
// Finished processing - return all groups
| [] -> List.rev (current::all)
// Start a new group, add current to the list
| x::xs when f(x) ->
adjacentGroupsUtil [x] (current::all) xs
// Add element to the current group
| x::xs ->
adjacentGroupsUtil (x::current) all xs
// Call utility function, drop all empty groups and
// reverse elements of each group (because they are
// collected in a reversed order)
adjacentGroupsUtil [] [] list
|> List.filter (fun l -> l <> [])
|> List.map List.rev
Now, implementing your specific algorithm is relatively easy. We first need to group the elements, starting a new group each time the first column has some value:
let groups = data |> adjacentGroups (fun (ln, cells) -> cells.[0] <> "")
In the second step, we need to do some processing for each group. We take its first element (and pick the title of the group) and then find the minimal and maximal line number among the remaining elements:
groups |> List.map (fun ((_, firstCols)::lines) ->
let lineNums = lines |> List.map fst
firstCols.[0], List.min lineNums, List.max lineNums )
Note that the pattern matching in the lambda function will give a warning, but we can safely ignore that because the group will always be non-empty.
Summary: This answer shows that if you want to write elegant code, you can implement your reusable higher order function (such as adjacentGroups
), because not everything is available in the F# core libraries. If you use functional lists, you can implement it using recursion (for arrays, you'd use imperative programming as in the answer by gradbot). Once you have a good set of reusable functions, most of the problems are easy :-).
In general when you only work with arrays you force yourself to use mutable and imperative style code. I made a generic Array.splitBy function to group together different sections. If you're going to write your own parser then I suggest using List and other high level constructs.
module Question
open System
let splitArrayBy f (array:_[]) =
[|
let i = ref 0
let start = ref 0
let last = ref [||]
while !i < array.Length do
if f array.[!i] then
yield !last, array.[!start .. !i - 1]
last := array.[!i]
start := !i + 1
i := !i + 1
if !start <> !i then
yield !last, array.[!start .. !i - 1]
|]
let input = "a;\n;a1\n;a2\n;a3\nb;\n;b1\n;b2\n;b3\n;b4\n;b5\nc;\n;c1"
let lines = input.Split('\n')
let data = lines |> Array.map (fun l -> l.Split(';'))
let result = data |> splitArrayBy (fun s -> s.[0] <> "")
Array.iter (printfn "%A") result
Will output the following.
([||], [||])
([|"a"; ""|], [|[|""; "a1"|]; [|""; "a2"|]; [|""; "a3"|]|])
([|"b"; ""|], [|[|""; "b1"|]; [|""; "b2"|]; [|""; "b3"|]; [|""; "b4"|]; [|""; "b5"|]|])
([|"c"; ""|], [|[|""; "c1"|]|])
Here is a slight modification from the above to produce the example output.
let splitArrayBy f (array:_[][]) =
[|
let i = ref 0
let start = ref 0
let last = ref ""
while !i < array.Length do
if f array.[!i] then
if !i <> 0 then
yield !start, !i - 1, !last
last := array.[!i].[0]
start := !i + 1
i := !i + 1
if !start <> !i then
yield !start, !i - 1, !last
|]
let input = "a;\n;a1\n;a2\n;a3\nb;\n;b1\n;b2\n;b3\n;b4\n;b5\nc;\n;c1"
let lines = input.Split('\n')
let data = lines |> Array.map (fun l -> l.Split(';'))
let result = data |> splitArrayBy (fun s -> s.[0] <> "")
(printfn "%A") result
Output
[|(1, 3, "a"); (5, 9, "b"); (11, 11, "c")|]
the JSON structure would appear to be ideal for you; parsers and converters are already availible.
read about it here: http://msdn.microsoft.com/en-us/library/bb299886.aspx
edit: for some reason i saw j#, perhaps it still applies in f#..
精彩评论