Recursive expression whose only base case is an exception [Context: Reading from files in OCaml]
Edit: Disregard this question! See comments below.
I want an OCaml expression which is passed a file (as an "in_channel"), then reads the file line by line, doing some processing, to the end, then returns the result of the processing.
I wrote this test:
let rec sampler_string file string_so_far =
try
let line = input_line file in
let first_two_letters = String.sub line 0 2 in
sampler_string file (string_so_far ^ first_two_letters)
with End_of_file -> string_so_far;;
let a = sampler_string (open_in Sys.argv.(1)) "";;
(Here the "doing some processing" is adding the first two characters of each line to a running tally, and the idea is that at the end a string containing the first two characters of every line should be returned.)
This doesn't work: OCaml thinks that "sampler_string" produces something of type unit, rather than of type string. (Difficulties then occur later when I try to use the result as a string.) I think this problem is because the only base case happens in an exception (the End_of_file).
So, a specific question and a general question:
- Is there a way to fix this code, by explicitly telling OCaml to expect that the result of sampler_string should be a string?
- Is there some standard, better syntax for a routine which reads a file line by line to the end, and returns the resu开发者_开发问答lt of line-by-line processing?
As Damien Pollet says, your sampler_string function compiles fine (and runs correctly) on my machine as well, ocaml v3.12.0. However, I'll answer your questions:
You can specify types on your functions/values using the
:
operator. For example, here's your function with it's types annotated. You'll notice that the return type is put at the very end of the function declaration.let rec sampler_string (file : in_channel) (string_so_far : string) : string = ...
I do not know if there's a better way of reading a file, line-by-line. It certainly is a pain to be forced to deal with an end-of-file via exception. Here's a blog post on the subject, though the function presented there is of reading a file into a list of lines. Another mailing list version.
A couple of nitpicks:
- You don't need to use
;;
to separate function/value definitions, ocamlc can figure it out from whitespace. - You should close your file sockets.
- String.sub will throw an exception if your file has a line with less than 2 characters.
A major point of style is avoiding recursive calls inside an exception handler. Such calls are not in tail position, so you will blow the stack with a sufficiently large file. Use this pattern instead:
let rec sampler_string file string_so_far =
match try Some (input_line file) with End_of_file -> None with
| Some line ->
let first_two_letters = String.sub line 0 2 in
sampler_string file (string_so_far ^ first_two_letters)
| None -> string_so_far
Of course a better functional strategy is to abstract away the recursive schema:
let rec fold_left_lines f e inch =
match try Some (input_line inch) with End_of_file -> None with
| Some line -> fold_left_lines f (f e line) inch
| None -> e
since "doing things with the lines of a file" is a generally useful operation in and of itself (counting lines, counting words, finding the longest line, parsing, etc. are all particular instances of this schema). Then your function is:
let sampler_string file string_so_far =
fold_left_lines (fun string_so_far line ->
let first_two_letters = String.sub line 0 2 in
string_so_far ^ first_two_letters)
string_so_far file
As Matias pointed out, it's first important to move the recursive call outside the try/with expression so it can be tail-call optimized.
However, there is a semi-standard solution for this: use Batteries Included. Batteries provides an abstraction, Enums, of the concept of iterating over something. Its IO infrastructure then provides the BatIO.lines_of
function, which returns an enumeration of the lines of a file. So your whole function can become this:
fold (fun s line -> s ^ String.sub line 0 2) "" (BatIO.lines_of file)
The enum will automatically close the file when it is exhausted or garbage collected.
The code can be made more efficient (avoiding the repeated concatenation) with a buffer:
let buf = Buffer.create 2048 in
let () = iter (fun line -> Buffer.add_string buf (String.sub line 0 2))
(BatIO.lines_of file) in
Buffer.contents buf
Basically: Batteries can save you a lot of time and effort in code like this.
精彩评论