Datatype for terms over a signature in SML
I want to implement an arbitrary signature in SML. How can I define a datatype for开发者_如何学编程 terms over that signature ?I would be needing it to write functions that checks whether the terms are well formed .
In my point of view, there are two major ways of representing an AST. Either as series of (possibly mutually recursive) datatypes or just as one big datatype. There are pros an cos for both.
If we define the following BNF (extracted from the SML definition and slightly simplified)
<exp> ::= <exp> andalso <exp>
| <exp> orelse <exp>
| raise <exp>
| <appexp>
<appexp> ::= <atexp> | <appexp> <atexp>
<atexp> ::= ( <exp>, ..., <exp> )
| [ <exp>, ..., <exp> ]
| ( <exp> ; ... ; <exp> )
| ( <exp> )
| ()
As stated this is simplified and much of the atexp is left out.
1. A series of possibly mutually recursive datatypes
Here you would for example create a datatype for expressions, declarations, patterns, etc. Basicly you would create a datatype for each of the non-terminals in your BNF.
We would most likely create the following datatypes
datatype exp = Andalso of exp * exp
| Orelse of exp * exp
| Raise of exp
| App of exp * atexp
| Atexp of atexp
and atexp = Tuple of exp list
| List of exp list
| Seq of exp list
| Par of exp
| Unit
Notice that the non-terminal has been consumed into exp datatype instead of having it as its own. That would just clutter up the AST for no reason. You have to remember that a BNF is often written in such a way that it also defined precedens and assosiativity (e.g., for arithmetic). In such cases you can often simplify the BNF by merging multiple non-terminals into one datatype.
The good thing about defining multiple datatypes is that you kind of get some well formednes of your AST. If for example we also had non-terminal for declarations, we know that the AST will newer contain a declaration inside a list (as only expressions can be there). Because of this, most of you well formedness check is not nessesary.
This is however not always a good thing. Often you need to do some checking on the AST anyways, for example type checking. In many cases the BNF is quite large and thus the number of datatypes nessesary to model the AST is also quite large. Keeping this in mind, you have to create one function for each of your datatypes,for every type of modification you wan't to do on your AST. In many cases you only wan't to change a small part of your AST but you will (most likely) still need to define a function for each datatype. Most of these functions will basicly be the identity and then only in a few cases you will do the desired work.
If for example we wan't to count how many units there are in a given AST we could define the following functions
fun countUnitexp (Andalso (e1, e2)) = countUnitexp e1 + countUnitexp e2
| countUnitexp (Orelse (e1, e2)) = countUnitexp e1 + countUnitexp e2
| countUnitexp (Raise e1) = countUnitexp e1
| countUnitexp (App (e1, atexp)) = countUnitexp e1 + countUnitatexp atexp
| countUnitexp (Atexp atexp) = countUnitatexp atexp
and countUnitatexp (Tuple exps) = sumUnit exps
| countUnitatexp (List exps) = sumUnit exps
| countUnitatexp (Seq exps) = sumUnit exps
| countUnitatexp (Par exp) = countUnitexp exp
| countUnitatexp Unit = 1
and sumUnit exps = foldl (fn (exp,b) => b + countUnitexp exp) 0 exps
As you may see we are doing a lot of work, just for this simple task. Imagine a bigger grammar and a more complicated task.
2. One (big) datatype (nodes) -- and a Tree of these nodes
Lets combine the datatypes from before, but change them such that they don't (themself) contain their children. Because in this approach we build a tree structure that has a node and some children of that node. Obviously if you have an identifier, then the identifier needs to contain the actual string representation (e.g., variable name).
So lets start out by defined the nodes for the tree structure.
(* The comment is the kind of children and possibly specific number of children
that the BNF defines to be valid *)
datatype node = Exp_Andalso (* [exp, exp] *)
| Exp_Orelse (* [exp, exp] *)
| Exp_Raise (* [exp] *)
| Exp_App (* [exp, atexp] *)
(* Superflous:| Exp_Atexp (* [atexp] *) *)
| Atexp_Tuple (* exp list *)
| Atexp_List (* exp list *)
| Atexp_Seq (* exp list *)
| Atexp_Par (* [exp] *)
| Atexp_Unit (* [] *)
See how the Atexp from the tupe now becomes superflous and thus we remove it. Personally I think it is nice to have the comment next by telling which children (in the tree structure) we can expect.
(* Note this is a non empty tree. That is you have to pack it in an option type
if you wan't to represent an empty tree *)
datatype 'a tree = T of 'a * 'a tree list
(* Define the ast as trees of our node datatype *)
type ast = node tree
We then define a generic tree and define the type ast to be a "tree of nodes". If you use some library then there is a big chance that such a tree structure is already present. Also it might be handy late on to extend this tree structure to contain more than just the node as data, however we just keep it simple here.
fun foldTree f b (T (n, [])) = f (n, b)
| foldTree f b (T (n, ts)) = foldl (fn (t, b') => foldTree f b' t)
(f (n, b)) ts
For this example we define a fold function over the tree, again if you are using a library then all these functions for folding, mapping, etc. are most likely already defined.
fun countUnit (Atexp_Unit) = 1
| countUnit _ = 0
If we then take the example from before, that we wan't to count the number of occurances of unit, we can then just fold the above function over the tree.
val someAST = T(Atexp_Tuple, [ T (Atexp_Unit, [])
, T (Exp_Raise, [])
, T (Atexp_Unit, [])
]
)
A simple AST could look like the above (note that this is actually not valid as we have a Exp_Raise with no children). And we could then do the counting by
foldTree (fn (a,b) => (countUnit a) + b) 0 someAST
The down side of this approach is that you have to write a check function that verifies that your AST is well formed, as there is no restrictions when you create the AST. This includes that the children are of the correct "type" (e.g., only Exp_* as children in an Exp_Andalso) and that there are the correct number of children (e.g., exactly two children in Exp_Andalso).
This approach also requires a bit of builk getting started, given you don't use some library that has a tree defined (including auxilary functions for modifying the tree). However in the long run it pays of.
精彩评论