开发者

Recursive expression whose only base case is an exception [Context: Reading from files in OCaml]

Edit: Disregard this question! See comments below.

I want an OCaml expression which is passed a file (as an "in_channel"), then reads the file line by line, doing some processing, to the end, then returns the result of the processing.

I wrote this test:

let rec sampler_string file string_so_far =
    try 
        let line = input_line file in
        let first_two_letters = String.sub line 0 2 in
        sampler_string file (string_so_far ^ first_two_letters)
    with End_of_file -> string_so_far;;

let a = sampler_string (open_in Sys.argv.(1)) "";;

(Here the "doing some processing" is adding the first two characters of each line to a running tally, and the idea is that at the end a string containing the first two characters of every line should be returned.)

This doesn't work: OCaml thinks that "sampler_string" produces something of type unit, rather than of type string. (Difficulties then occur later when I try to use the result as a string.) I think this problem is because the only base case happens in an exception (the End_of_file).

So, a specific question and a general question:

  1. Is there a way to fix this code, by explicitly telling OCaml to expect that the result of sampler_string should be a string?
  2. Is there some standard, better syntax for a routine which reads a file line by line to the end, and returns the resu开发者_开发问答lt of line-by-line processing?


As Damien Pollet says, your sampler_string function compiles fine (and runs correctly) on my machine as well, ocaml v3.12.0. However, I'll answer your questions:

  1. You can specify types on your functions/values using the : operator. For example, here's your function with it's types annotated. You'll notice that the return type is put at the very end of the function declaration.

    let rec sampler_string (file : in_channel) (string_so_far : string) : string = ...
    
  2. I do not know if there's a better way of reading a file, line-by-line. It certainly is a pain to be forced to deal with an end-of-file via exception. Here's a blog post on the subject, though the function presented there is of reading a file into a list of lines. Another mailing list version.

A couple of nitpicks:

  1. You don't need to use ;; to separate function/value definitions, ocamlc can figure it out from whitespace.
  2. You should close your file sockets.
  3. String.sub will throw an exception if your file has a line with less than 2 characters.


A major point of style is avoiding recursive calls inside an exception handler. Such calls are not in tail position, so you will blow the stack with a sufficiently large file. Use this pattern instead:

let rec sampler_string file string_so_far =
  match try Some (input_line file) with End_of_file -> None with
  | Some line ->
      let first_two_letters = String.sub line 0 2 in
      sampler_string file (string_so_far ^ first_two_letters)
  | None -> string_so_far

Of course a better functional strategy is to abstract away the recursive schema:

let rec fold_left_lines f e inch =
  match try Some (input_line inch) with End_of_file -> None with
  | Some line -> fold_left_lines f (f e line) inch
  | None -> e

since "doing things with the lines of a file" is a generally useful operation in and of itself (counting lines, counting words, finding the longest line, parsing, etc. are all particular instances of this schema). Then your function is:

let sampler_string file string_so_far =
  fold_left_lines (fun string_so_far line ->
      let first_two_letters = String.sub line 0 2 in
      string_so_far ^ first_two_letters)
    string_so_far file


As Matias pointed out, it's first important to move the recursive call outside the try/with expression so it can be tail-call optimized.

However, there is a semi-standard solution for this: use Batteries Included. Batteries provides an abstraction, Enums, of the concept of iterating over something. Its IO infrastructure then provides the BatIO.lines_of function, which returns an enumeration of the lines of a file. So your whole function can become this:

fold (fun s line -> s ^ String.sub line 0 2) "" (BatIO.lines_of file)

The enum will automatically close the file when it is exhausted or garbage collected.

The code can be made more efficient (avoiding the repeated concatenation) with a buffer:

let buf = Buffer.create 2048 in
let () = iter (fun line -> Buffer.add_string buf (String.sub line 0 2))
  (BatIO.lines_of file) in
Buffer.contents buf

Basically: Batteries can save you a lot of time and effort in code like this.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜