Open and closed union types in Ocaml
I'm looking into OCaml for the first time, having a bit of background with F# and Haskell. As such, a开发者_StackOverflow lot is familiar-looking, but one thing that isn't is the concept of "open" and "closed" unions (with the backtick and [< syntax).
What are these useful for and how often are they used?
gasche's answer has good advice. I'm going to explain open and closed unions a bit more.
First, you need to distinguish the two kinds of unions: basic variants (no backtick) and polymorphic variants (with backtick).
- Basic variants are generative: if you define two types with the same constructor names in different modules
M1
andM2
, you have different types.M1.Foo
andM2.Foo
are different constructors.`Foo
is always the same constructor no matter where you use it. - Apart from this, polymorphic variants can do everything basic variants can do and more. But with great power comes great complexity, so you should use them only when necessary and carefully.
A polymorphic variant type describes what constructors the type may have. But many polymorphic variant types are not fully known — they contain (row) type variables. Consider the empty list []
: its type is 'a list
, and it can be used in many contexts that assign more specific types to 'a
. For example:
# let empty_list = [];;
val empty_list : 'a list = []
# let list_of_lists = [] :: empty_list;;
val list_of_lists : 'a list list = [[]]
# let list_of_integers = 3 :: empty_list;;
val list_of_integers : int list = [3]
The same holds for the row type variables. An open type, written [> … ]
, has a row variable that can be instantiated to cover more constructors each time the value is used.
# let foo = `Foo;;
val foo : [> `Foo ] = `Foo
# let bar = `Bar;;
val bar : [> `Bar ] = `Bar
# let foobar = [foo; bar];;
val foobar : [> `Bar | `Foo ] list = [`Foo; `Bar]
Just because a constructor appears in a type doesn't mean every use of that type has to allow all constructors. [> …]
says that a type must have at least these constructors, and dually [< …]
says that a type must have at most these constructors. Consider this function:
# let h = function `Foo -> `Bar | `Bar -> `Foo;;
val h : [< `Bar | `Foo ] -> [> `Bar | `Foo ] = <fun>
h
is only capable of handling Foo
and Bar
, so the input type may not allow other constructors; but it's ok to call h
on a type that only allows Foo
. Conversely, h
may return Foo
or Bar
, and any context where h
is used must allow both Foo
and Bar
(and may allow others).
Closed types arise when there are matching minimum and maximum constructor requirements on a type. For example, let's add the constraint that h
must have the same input and output type:
# let hh : 'a -> 'a = function `Foo -> `Bar | `Bar -> `Foo;;
val hh : [ `Bar | `Foo ] -> [ `Bar | `Foo ] = <fun>
Closed types rarely arise naturally from type inference. Most of the time, like here, they are a consequence of a user annotation. When you use polymorphic annotations, it's a good idea to define named types and use them at least on every toplevel function. Otherwise the inferred type is likely to be a little more general than you thought. While this rarely opens the way to bugs, it often means that any type error will be diagnosed later than it could have been, and will generate a very long error message where it will be hard to find the helpful bits.
I recommend reading and working through (i.e. retype the examples in the toplevel, play around a bit to make sure you understand each step) the polymorphic variant tutorial in the Ocaml manual.
You need to read Jacques Garrigue's "Code reuse with polymorphic variants":
http://www.math.nagoya-u.ac.jp/~garrigue/papers/fose2000.html
The problem with polymorphic variants is that they are so flexible that type inference cannot help you a lot with errors in polymorphic variants code. For example, if you mistype one constructor name, the compiler won't be able to flag an error, it will only infer a slightly different types with the usual constructors, plus the misspelled one. Errors will only be spotted later, when you try to combine the faulty code with a function with strict assumptions on the variants (closed pattern matching), with an unwieldy error message.
My advice for polymorphic variant users is to massively use annotations to control type-checking : every time you take a polymorphic variant as input, or output one, the function should be annotated with a precise type for the variant part. This will shield you from most inference issues, and force you to build an expressive set of type definitions that can be composed and help reasoning about your program.
精彩评论