Welcome!

Welcome to smlhelp.github.io! The goal of this site is to provide a convenient resource for learning functional programming, particularly in the language of Standard ML. You'll find various guides and references about the SML type system & functional programming concepts, as well as some worked examples and tutorials. Click on the navigation links on the left to get started!

Getting Started

Installing SML/NJ

By Brandon Wu, May 2020

Installing SML/NJ on AFS

To set up your SML/NJ environment on the Andrew File System (AFS), first you will need to log in to AFS.

You can do this by executing the command ssh <andrew_id>@unix.andrew.cmu.edu

You can do this from a terminal very straightforwardly on a Mac. If you are on a Windows machine, you may need to use an application such as Visual Studio Code or MobaXTerm to access a terminal.

Once you are in AFS, execute the command /afs/andrew/course/15/150/bin/setup-path

Once the script has finished running, type in the command that it tells you to. It should look something like . '/afs/andrew.cmu.edu/usrc/<andrew_id>/.bashrc

IMPORTANT: Do not forget the . at the beginning! This will cause the script to not work.

After finishing this, you should be able to type in smlnj from the command line and access the SML/NJ REPL.

SML Syntax Cheatsheet

By David Sun, February 2021

Built-in Types

Six of the built-in types commonly encountered:

ExpressionType
0int
"foo bar"string
#" "char
truebool
1.0real
()unit

Structured Types

Make tuples and lists using built-in types and themselves.

ExpressionType
(15,150)int * int
[1,2,3,4]int list
[["foo","bar"],["baz"]]string list list
((true,1),(false,0,()))(bool * int) * (bool * int * unit)
[(0,1),(1,0)](int * int) list
([#"a",#"b"],[3.14])char list * real list

Note: 1-tuples don't exist in Standard ML.

Operators

Operators have different priority levels. Higher priority operations are performed before lower priority operations. The operators *, +, and - work on both int and real types.

OperatorMeaningPriorityExample ExpressionEvaluates ToNotes
*numeric multiplication78 * 324
/real division73.0 / 2.01.5operands must be real
divinteger divison73 div 21operands must be int
mod"modulo"78 mod 32operands must be int
+numeric addition63 + 47
-numeric subtraction63 - ~25~ denotes negative numbers, e.g. ~5
^string combination6"foo" ^ "bar""foobar"
::list construction ("cons")51 :: [2,3,4][1,2,3,4]right-associative
@list combination ("append")5[1,2] @ [3,4][1,2,3,4]right-associative

Except for :: and @, the remaining built-in operators above are left-associative. Left-associative operations of equal priority implicitly evaluate from left to right. Right-associative operations of equal priority implicitly evaluate from right to left.

ExpressionImplicit InterpretationEvaluates To
3 - ~2 + ~5((3 - ~2) + ~5)0
1 :: 2 :: 3 :: []1 :: (2 :: (3 :: []))[1,2,3]
1 :: [] @ 2 :: [] @ 3 :: []1 :: ([] @ (2 :: ([] @ (3 :: []))))[1,2,3]

Boolean Operation

There are three main ones: andalso, orelse and not.

ExpressionEvaluates ToNotes
false andalso truefalseandalso short-circuits if left operand evaluates to false
true orelse falsetrueorelse short-circuits if left operand evaluates to true
not truefalse
not falsetrue

Note: See the page about the bool type here for more information on short-circuiting behavior.

There are built-in equality operators: = and <>.

OperatorMeaningPriorityExample ExpressionEvaluates To
="equal to"41+2 = 4-1true
<>"not equal to"4"a" <> "b"true

These two operate on equality types, which include the built-in types mentioned before — and the structured types that can be made from them — excluding real and function types.

ExpressionEvaluates To
(true,true) = (true,true)true
0 = 1 andalso 1 = 1false
0 <> 0 orelse 1 <> 1false
[1,2,3,4] = [1,2,3,4]true
(1,2,"a","b") = (1,2,"a","b")true
([1,2,3,4],(["a","b"],[()])) = ([1,2,3,4],(["a","b"],[()]))true
0.0 = 0.0N/A: Not Well Typed

Note: See the page about the real type here for more information on why 0.0 = 0.0 is not allowed.

There are built-in comparison operators >, <, >=, and <=.

OperatorMeaningPriorityExample ExpressionEvaluates To
>"greater than"4"ba" > "ab"true
<"less than"4"ab" < "abc"true
>="greater than or equal to"4#"a" >= #"A"true
<="less than or equal to"4"cab" <= "cba"true

These have limited use; they operate on int, string, char, real.

To build good habits, please practice using the built-in comparison functions Int.compare, String.compare, Char.compare, and Real.compare to compare their corresponding types instead of exclusively using these equality and comparison operators.

Comparison Functions

There is an order type with three values: LESS, EQUAL, and GREATER.

ExpressionType
LESSorder
EQUALorder
GREATERorder
Int.compareint * int -> order
String.comparestring * string -> order
Real.comparereal * real -> order

Example use:

ExpressionEvaluates To
Int.compare (~1,0)LESS
Int.compare (0,0)EQUAL
Int.compare (1,0)GREATER
String.compare ("abc","bac")LESS
String.compare ("cba","cb")GREATER
Real.compare (0.0,0.0)EQUAL

Sometimes you want to compare data that is not of basic built-in types, e.g. when sorting lists of tuples. The built-in comparison operators by themselves will not work, but using the order type allows you to write a comparison function (perhaps using other comparison functions) that defines your own order over that data.

Comments

Comments are denoted using (* *).

(* This is a comment. *)

(* This is another comment.
   Comments can span multiple lines!
 *)

(* This is (* a comment within *) a comment. *)

Value Binding

Use the val keyword to create variables. It binds values to identifiers (variable names).

val x : int = 5
val (a,b) : int * int = (1,2)
val L : int list = [3,4]
ExpressionEvaluates To
x5
a1
b2
3 * x15
2 * x * (5 + 2 * x)150
a * x + b7
a :: b :: L[1,2,3,4]

Let-Expressions

Create local bindings (local variables) to compute a let-expression. Place declarations and bindings between the let-in; the let-expression between the in-end. Can be nested. The scope of the let-in declaration is that let-expression's expression.

Expression Evaluates To
let
  val x : int = 25
  val x : int = x + 25 (* Shadows the previous x binding *)
  val y : int = x + 50
in
  x + y
end
150
let
  val x : int = 25
in
  let
    val x : int = x + 25 (* Shadows the previous x binding *)
  in
    let
      val y : int = x + 50
    in
      x + y
    end
  end
end
150

Lambda Expressions

Write lambda expressions using the fn keyword — often verbalized as "lambda". A lambda expression is of the form: fn, pattern, =>, expression. The lambda expression itself is a value a value of function type. The => in lambda expressions correspond to the -> in their types. The -> arrows are right-associative infix type constructors denoting function types. Apply lambda expressions via prefix application — before the immediate operand.

ExpressionEvaluates ToType
(fn (x : int,y : int) => x + y)(fn (x : int,y : int) => x + y)int * int -> int
(3,4)(3,4)int * int
(fn (x : int,y : int) => x + y) (3,4)7int

Function Binding

Using a lambda expression more than once requires retyping it. We can give it a name instead. The val and fun keywords bind a lambda expression to an identifier, creating a named function. Take note that the = for binding differs from the => reserved word.

(* add : int * int -> int *)
val add : int * int -> int = fn (x,y) => x + y

(* add : int * int -> int *)
fun add (x : int,y : int) : int = x + y

Both function bindings for add above have the same value: (fn (x,y) => x + y) : int * int -> int.

A named function can be thought of as a lambda expression that has been "identified". The bindings to add identify (or name) an otherwise anonymous function. Its value is the lambda expression it binds.

ExpressionValueType
add(fn (x,y) => x + y)int * int -> int
(fn (x,y) => x + y)(fn (x,y) => x + y)int * int -> int
add (3,4)7int
(fn (x,y) => x + y) (3,4)7int

Patterns and Case Expressions

Patterns are present in every lambda expression, case expression, val, and fun binding. Every fn clause, case clause, and fun clause contains a pattern with a corresponding expression. A clause is of the form: pattern, =>, expression. Clauses are delimited by pipes |.

Expression Example Clause Clause Pattern Clause Expression
(fn (x,y) => x + y)

(x,y) => x + y

(x,y)

x + y

(fn true  => 1 
  | false => 0)

true => 1

true

1

(fn 0 => true 
  | _ => false)

_ => false

_

false

Lambda expressions and case expressions have the same clause syntax. The clausal patterns must be able to match to the type of the expression being cased on. The clausal expressions must all have the same type (which may be different from that of the expression cased on).

Expression Example Clause Clause Pattern Clause Expression
(case () of 
      _ => ())

_ => ()

_

()

(case #"A" < #"a" of 
      true  => ":)" 
    | false => ":(")

true => ":)"

true

":)"

(case Int.compare (1,0) of 
      LESS    => false 
    | EQUAL   => false 
    | GREATER => true)

GREATER => true

GREATER

true

The wildcard pattern _ will match to any type, but create no bindings (ignore it).

CandidateValid Pattern?
()Yes
0Yes
":)"Yes
trueYes
EQUALYes
3 + 4No
":" ^ ")"No
3 < 4No
Int.compare (0,0)No
Int.compareNo
0.0No
(fn x => x)No
xYes
any variable name that is not a reserved wordYes
_Yes
(0,1)Yes
(x,y)Yes
(_,_)Yes
(x,x)No
[]Yes
[x]Yes
[[[]]]Yes
([],[])Yes
[] @ []No
[x] @ xsNo
L @ RNo
x::xsYes
x::y::xsYes
_::_Yes

A pattern that accounts for every possible value of the type it matches to is said to perform an exhaustive match. The match is nonexhaustive if and only if a possible value of that pattern's type is missed.

Expression Pattern Type Exhaustive Match?
(fn () => ())

unit

Yes

(fn true => 1)

bool

No

(fn true  => 1 
  | false => 0)

bool

Yes

(fn LESS => ~1)

order

No

(fn LESS  => ~1 
  | EQUAL => 0)

order

No

(fn LESS    => ~1 
  | EQUAL   => 0 
  | GREATER => 1)

order

Yes

(fn 0 => true)

int

No

(fn 0 => true 
  | _ => false)

int

Yes

(fn x::_ => x + 1)

int list

No

(fn [] => 0 
  | x::_ => x + 1)

int list

Yes

(fn (0,b) => true andalso b)

int * bool

No

(fn (0,b) => true andalso b 
  | (n,_) => false)

int * bool

Yes

Using a wildcard for the first clause's entire pattern produces an exhaustive match.

Recursive Function Binding

The rec reserved word enables a lambda expression to self-reference within its body. The fun reserved word allows self-reference by default. The clause patterns and expressions in fun clauses are separated by = instead of =>.

val rec length : int list -> int = fn [] => 0 | _::xs => 1 + length xs

fun length ([]    : int list) : int = 0
  | length (_::xs : int list) : int = 1 + length xs

As before, both length bindings have the same value. Don't forget about the lambda!

Expression Value Type

length

(fn [] => 0 | _::xs => 1 + length xs)

int list -> int

length []

0

int

length [1,2,3,4]

4

int

Conditional Expressions

Require a condition that evaluates to true or false, after the if. Two expressions of the same type — one for the then-branch, one for the else-branch. Note that if-then-else expressions only evaluate one of its two branches — the one it takes.

ExpressionEvaluates To
if 0 <> 1 then "foo" else "bar""foo"
if 0 = 1 then (if true then 1 else 2) else (if false then 3 else 4)4
if true then 1 else (1 div 0)1
if false then (1 div 0) else 00

op

The op keyword converts a binary infix operator to binary prefix operation. Priorities are kept the same as before.

ExpressionEvaluates To
(op *) (8,3)24
(op +) (3,4)7
(op ^) ("foo","bar")"foobar"
(op ::) (1,[2,3,4])[1,2,3,4]
(op @) ([1,2],[3,4])[1,2,3,4]

as

If convenient, we can use the as keyword between a variable and a structured pattern to reference a structured value both as a whole and by its constituents. The pattern to the left of as must be a variable. It can be nested. It is always part of a pattern.

val tuple as (a,b) : int * int = (1,2)
Variable NameBound To
a1
b2
tuple(1,2)
val outer as (inner as (a,b),c) : (int * int) * int = ((1,2),3)
Variable NameBound To
outer((1,2),3)
inner(1,2)
a1
b2
c3
val L1 as x1::(L2 as x2::(L3 as x3::L4)) : int list = [1,2,3]
Variable NameBound To
L1[1,2,3]
x11
L2[2,3]
x22
L3[3]
x33
L4[]

Common Tasks in SML

Types & Signatures

Types

By Brandon Wu, May 2020

Types are a very fundamental concept to Standard ML (SML), and indeed, to functional programming in general. Most programming languages have some notion of type, with int, float, and data structures such as array being common examples, however they tend to be weakly enforced, only being verified at runtime. In SML, we employ a system of strong typing consisting of stricter typing rules — which allows us to catch errors earlier in program execution — at compile time.

Type Safety

Oftentimes, data is separated into types so that we can differentiate different kinds of data from each other. For instance, it makes no sense to add a string and an int, though certain programming languages will try to make sense of it. Usually, when different data types are haphazardly intermingled, it is because someone wrote a bug. The philosophy behind SML's type system is to disallow such intermingling. In SML, every expression and every function has a specified type, which governs what interactions are possible with other expressions.

Consider the following code fragment in Python:

def foo(x):
    if x == 2:
        return 1
    elif x == "bar":
        return True
    return None

What is the type of its output? The answer is "it depends", as it is dependent on the value of x that is passed in. We could give foo any type of argument, and we can see that in the cases that we pass in 2 or "bar", we could obtain an int or a bool as output, or even a None in any other case.

Consider the expression foo(y) + 3. Is it safe to evaluate? Again, the answer depends on what the value of y is, but we see the same answer of "it depends". In some cases, depending on what the program has done up to this point, it may be safe; this is in the case where y is 2, in which case foo(y) + 3 would just be 4. But if it wasn't, we may end up trying to add True or None to 3, which clearly doesn't make sense. If we tried to add None to 3 we would encounter a type error. While a contrived piece of code, type errors such as this spring up in code all the time. Type errors are unsafe. Whoever wrote this program likely didn't intend to try and add a non-int to 3, but it can be tricky to reason about whether or not such an outcome is truly possible.

In SML, our philosophy will be to make such uncertainties impossible. We impose a certain degree of determinism on our programs, such that the types of each expression and each step of evaluation throughout our program have a definite type that is known to the program. If a program tries to execute some computation that would use the wrong type somewhere, or otherwise cause types to mismatch, then we would call the program ill-typed or not well-typed, and it would be rejected before any evaluation. This process of verifying types is called type-checking, and occurs at compile-time, which stands opposed to run-time. Compile-time type-checking happens before any actual evaluation — before the program's run-time — and only upon passing the type-checking phase will the program actually execute. At this point, we say the program type-checks.

Type-Checking

The most fundamental rule for type-checking is during function application, or the act of applying a function to its arguments. This is elaborated on further in the chapter on functions.

[APP] An expression e1 e2 has type t2 if and only if e1 : t1 -> t2 and e2 : t1.

More specifically, for a function f : t1 -> t2, if x : t3 where t3 is not the same type as t1, then the expression f x is not well-typed and therefore has no value. In other words, applying a function to an argument that does not match its argument type does not type-check — because doing that creates a type error — which will prompt the compiler to reject the program at compile-time during the type-checking process. We call an expression that does not encounter a type error well-typed. A program that passes the type-checking process is one that type-checks.

The majority of type errors will occur as a consequence of this rule. Since functions have definite return types, it is very straightforward to check if a program type-checks or not: simply evaluate the types of the expressions and see if any type errors are encountered. The type-checking phase is agnostic to the specific values of well-typed expressions. When given an expression — such as 1 + 2 — the compiler sees two expressions of type int being passed into a function of type int * int -> int and knows that the result must be of type int — thus the entire expression 1 + 2 is well-typed.

Because of this, the well-typedness of expressions is independent of any run-time errors that may occur. For instance, the expression 1 div 0 clearly cannot give back a value of type int, as division by 0 is undefined. Instead, 1 div 0 will raise the exception Div during execution and try to terminate the execution of the program. From the perspective of the compiler's type-checking process, it only cares that all arguments to div are of the right type — it only cares that the types match. The second argument of div being 0 still type-checks. So even though 1 div 0 will not evaluate to a value, it also will not encounter a type error; therefore the expression 1 div 0 is well-typed and has the type int.

Conclusion

SML's strong type system is a very powerful tool for ensuring the correctness of programs. The philosophy behind Standard ML is to push errors to compile-time — before a program is even run. In doing so, we ensure that unexpected errors do not arise during run-time, long after we've already proven that our code is correct. With strong typing and type-checking, we can guarantee that our code will be free of type errors, eliminating those bugs from our code at run-time. Later in this section, we will discuss some concrete types.

Bool

By Jacob Neumann, May 2020

bool is the SML type of booleans. The bool type supports the usual constructs of boolean logic and "conditionals" (if expressions). bool is also the type produced when evaluating (in)equality between values of (suitable) types.

Values

There are exactly two values of type bool, true and false.

datatype bool = true | false

In addition to the constructs generally available in pattern matching (e.g. wildcards and identifier binding), booleans can be pattern-matched against using the constructors true and false.

fun firstOrSecond ((x : int,y : int), true):int = x
  | firstOrSecond ((x,y), false) = y

val 2 = firstOrSecond((3,2),false)

bool is pretty-printed by the smlnj REPL, so the actual values will display. This is demonstrated by the following smlnj REPL snippet.

- val b = true orelse false;
val b = true : bool

Production

Some common functions which produce booleans:

(op =)  : ''a * ''a -> bool
(op <>) : ''a * ''a -> bool  (* Inequality *)

(* All of the following are overloaded and also work on values of type real *)
(op <)  : int * int -> bool
(op >)  : int * int -> bool
(op <=) : int * int -> bool
(op >=) : int * int -> bool

Elimination

The principal use of booleans is for evaluating one of two possible expressions, conditional on a value of type bool:

(* Evaluates to 5 *)
val res1 = if true then 5 else 2

(* Evaluates to 7 *)
val res2 = if false then 3+3 else 7

Note that the expression between then and else (the "then branch") has the same type as the expression after the else (the "else branch"). This is necessary: the SML typechecker does not evaluate expressions, and so does not "know" that, for instance, if false then e1 else e2 will always reduce to e2. As far as the typechecker knows, if false then e1 else e2 could reduce to e1. So, in order for the typechecker to be able to assign a type to the expression if false then e1 else e2, it must be the case that e1 and e2 have the same type. More formally,

[If-Then] An expression if b then e1 else e2 is well-typed (with type t) if and only if b : bool and e1 : t and e2 : t.

It is worth noting that if b then e1 else e2 is equivalent to the following expression, written using SML's case syntax.

    case b of
      true => e1
    | false => e2

Which is also equivalent to

    (fn true => e1 | false => e2) b

NOTE: It's important to note that, in the evaluation of the expression if true then e1 else e2, the expression e2 is never evaluated (and, analogously, e1 never is evaluated in if false then e1 else e2). This is most evident when we look at the third syntax (with the lambda function): SML does not evaluate the body of a function until the function is called. So when (fn true => e1 | false => e2) gets applied to, say, true, then the evaluation steps immediately to e1, without ever evaluating e2. This point is explored more in a question below.

Combination

bool is an equality type, and may therefore be compared with =, producing another bool.

val true = (true = true)
val false = (true = false)
val false = (false <> false)

bool also comes equipped with the usual boolean operators,

val true = true andalso true   (* andalso keyword, logical and *)
val true = false orelse true   (* orelse keyword, logical or *)
val false = not true           (* not:bool -> bool, logical negation *)

An important note about andalso and orelse: the evaluation of b1 andalso b2 has a behavior known as short-circuiting:[1] when evaluating this expression, SML will first attempt to evaluate b1. If b1 raises an exception or loops behavior, then that will be the behavior of b1 andalso b2 as a whole. If b1 reduces down to the value true, then SML will then attempt to evaluate b2. However, if b1 evaluates to the value false, then SML will not evaluate b2. This is exhibited in the following code snippet.

(* loops forever on any input *)
fun loop (x:int):bool = loop x

(* Evaluates to the value false, and doesn't loop *)
val false = false andalso (loop 3)

From the Structure

The structure Bool is bound as part of the SML Basis. In addition to what's already been mentioned, it includes the utility function

	Bool.toString : bool -> string

This is useful (for instance) for print-debugging the value of a bool-valued variable.

Questions to Consider

  1. Why are the following expressions not equivalent?
    (if b then e1 else e2)

    (fn (x,y) => if b then x else y) (e1,e2)
  1. Why are the following expressions not equivalent?
    b1 andalso b2

    (fn (v1,v2) => v1 andalso v2) (b1,b2)

Footnotes

[1]: This is why andalso and orelse are designated as keywords above: they are not infixed functions of type bool*bool->bool. Functions cannot exhibit this kind of "shortcircuiting" (evaluating one of their arguments and then deciding whether to evaluate the other): the integer addition (op +) : int*int -> int must have both of its arguments fully evaluated before proceeding to add them. The keywords andalso and orelse must be built-in to the SML evaluator to achieve shortcircuiting.

Int

By Brandon Wu, May 2020

int is the SML type of integers.

Values

The underlying representation of integers can be likened to the following:

datatype int = ... | ~2 | ~1 | 0 | 1 | 2 | ...

This signifies that the type int is inhabited by infinitely many values, all corresponding to whole numbers. In particular, every integer forms its own constant constructor for the int type, meaning that they each individually can be pattern matched upon. Note that the use of ~ above denotes negativity. Additionally, ~ is a valid function of type int -> int that negates a number.

While in practice, computers can only represent a finite number of integers, for the purposes of this class we will generally assume the integers to be unbounded. This means that we can do induction on SML integers in exactly the same way as we would do induction on the natural numbers, and that we do not have to worry about the consequences of edge case behavior. This allows us to ignore pedantic implementation details and explore mathematically interesting properties of programs.

Production

Integers have all the familiar arithmetic operations available to them. Note that some of these functions are also overloaded to work with real types - this is further discussed in the Real page.

(op +)   : int * int -> int
(op -)   : int * int -> int
(op *)   : int * int -> int
(op div) : int * int -> int
(op mod) : int * int -> int

All of these functions are infixed, so instead of being applied as +(2, 3), we write 2 + 3. Additionally, div and mod are not defined when the second argument is 0, and will raise a Div exception.

Combination

Integers are also eligible for comparison, including equality and inequality. (In other words, integers are an equality type).

(op =)  : int * int -> bool
(op <>) : int * int -> bool (* Inequality *)

(op <)  : int * int -> bool
(op >)  : int * int -> bool
(op <=) : int * int -> bool
(op >=) : int * int -> bool

From the Structure

The structure Int is bound as part of the SML Basis. It includes helpful functions such as

Int.toString : int -> string
Int.compare  : int * int -> order
Int.min      : int * int -> int
Int.max      : int * int -> int

where Int.toString is the function that returns the string representation of a given integer, and Int.compare has return type order, which is inhabited only by values LESS, EQUAL, and GREATER. Int.compare (x, y) returns LESS only if x < y, EQUAL if x = y, and GREATER if x > y. Additionally, Int.min and Int.max are just the corresponding min and max functions for integers.

Real

By Brandon Wu, May 2020

real is the SML type of real or floating-point numbers. As in other programming languages, reals in SML are restricted to finite machine representations, which means that they cannot represent every real number with perfect precision. For this reason, generally in this course we will prefer the use of ints when performing numeric operations.

Values

A real number is a sequence of numbers, followed by a decimal point, followed by another sequence of numbers. This includes examples such as 15.150, 1.0, and 3.14159. Reals are noteworthy in that they are not equality types, which means that they cannot be compared for equality with the = operator. In addition, they cannot be pattern matched upon. This means that when designing programs with specific behavior based on equality with a specific real number, they should instead be written to operate within some degree of precision of the real number in question. For instance:

val equalThreshold = 0.000001
fun isZero (x : real) : bool = Real.abs x < equalThreshold

This function simply prespecifies a (small) range, within which a number can be considered to be "equal" to 0. It uses the function Real.abs to check if the real number in question is within that threshold of zero, in either direction. In this way, we can approximate some test for equality, up to some degree of acceptable precision.

Production

Real numbers similarly have access to some of the basic arithmetic operations as integers. In particular, they have:

(op +) : real * real -> real
(op -) : real * real -> real
(op *) : real * real -> real
(op /) : real * real -> real

Note that all but the last operator are also defined to work on int types. This may seem to violate type safety, however this is just an example of those functions being overloaded. There are two "copies" of, for instance, the + operator - one that has type int * int -> int and one with type real * real -> real. Notably, however, it only works on either both ints or both reals - it is not defined on both. As such, SML can infer from its arguments whether it should use the int or the real variant, and similarly for - and *. div, however, is only defined for integers - / is the counterpart for division on the real numbers.

Combination

While not defined for equality, reals can still be compared to one another.

(op <)  : real * real -> bool
(op >)  : real * real -> bool
(op <=) : real * real -> bool
(op >=) : real * real -> bool

These operations are similarly overloaded, and will also work on integers.

From the Structure

The structure Real is bound as part of the SML Basis. It has access to a few useful functions, including:

Real.toString : real -> string
Real.compare  : real * real -> order
Real.abs      : real -> real

where Real.toString is the standard function that transforms a real number into its corresponding string representation, Real.compare on two reals returns LESS, EQUAL, or GREATER depending on their relative magnitudes, and Real.abs returns the absolute value of the real number in question.

String

By Brandon Wu, May 2020

string is the SML type of ordered collections of characters.

Values

Any valid string literal is a value of type string. This means that examples such as "functional", "15-150", and "\n" are all valid strings, forming their own constant constructors that can thus be pattern matched upon.

fun courseToNum ("15-150" : string) : int = 15150
  | courseToNum ("15-151" : string) : int = 15151
  | courseToNum ("15-122" : string) : int = 15122

Production

Numerous types have their own toString functions that allow them to be easily converted to their string representations, including:

Bool.toString : bool -> string
Int.toString  : int -> string
Real.toString : real -> string

Combination

Strings can be combined by means of the ^ operator, or "concatenation". ^ takes two strings and joins them together, without creating any spaces. As such, if neither string contains spaces, then the resulting string will be attached directly. Specifically, the result of an operation such as "functional" ^ "programming" will be "functionalprogramming".

(op ^) : string * string -> string

From the Structure

The structure String is bound as part of the SML Basis. It contains several useful functions for dealing with strings, such as:

String.explode : string -> char list
String.implode : char list -> string

String.explode takes a string and converts it to a list of its constituent characters, in order as they appear in the string. String.implode is the opposite, taking in a list of characters and joining them to form a string. This means that:

val [#"1", #"5", #"1", #"5", #"0"] = String.explode "15150"
val "15150" = String.implode [#"1", #"5", #"1", #"5", #"0"]

Note that the use of # is to denote that each element of the list is a char type, as opposed to a string of length 1.

Functions

By Brandon Wu, May 2020

Functions are a familiar concept in programming. In most languages, functions seem to capture a notion of a list of instructions to be carried out, with each invocation of the function resulting in another round of executing its instructions. In this class, however, we will take another perspective on functions - one that identifies the function more with the values that it outputs than the instructions that it executes.

What is a Function?

What is a function? To most seasoned programmers, the definition given in the above section seems to be the most obvious. A function (or subroutine) is simply identified with the instructions that it executes, which have some effect on the state of the program as a whole, such as incrementing some variable, or setting some flag.

Before most programmers were programmers, however, they had a different notion of a function. To a mathematician, a function is something else entirely. Instead of being an algorithmic sequence of instructions, a function is simply an entity that maps inputs to outputs - for example, (f(x) = x + 1). Something rather notable is that mathematical functions are pure - given the same input, they always return the same output. So while it is a valid question in programming to ask how a function's behavior changes over time, this is a nonsensical question in terms of mathematical functions.

To be more concrete, let us consider a Python program.

x = 0
def f(y):
    x += 1
    return x + y

This program instantiates a variable x outside the scope of the function f (which takes a single argument y), and the behavior of f is to increment the value of x, then return the sum of x and y. What we would find is that the first time that we run f(0), for instance, we obtain 1. The second time that we run f(0), it will return 2, and so on and so forth. We cannot even say that f(0) = f(0)! The output behavior of this function changes every time that it is run. This makes it difficult to reason about the function - in order to do so, we must know the number of times that it has been called before, at a given step in the program. While this is a fairly tame example, this problem only compounds with more complicated functions.

Clearly, this function is impure. Can we do better?

Function Types and Function Application

So far we have seen basic types such as int and string, among others. Functions allow us to compose types in new ways.

In SML, we denote the type of a function that has input type t1 and output type t2 (for some arbitrary, fixed t1 and t2) to be t1 -> t2. By SML's strict typing rules, functions of type t1 -> t2 can only take in inputs of type t1 and return outputs of type t2, for any types t1 and t2. Additionally, we write e1 e2 for the expression consisting of the function e1 being given as input the expression e2 (so we may write the mathematical function (f(x)) instead as f x).

[APP] An expression e1 e2 has type t2 if and only if e1 : t1 -> t2 and e2 : t1.

We call the above rule [APP] since it concerns the types of expressions during function application, or the process of applying a function to an argument.

Note that a function must always have type t1 -> t2 (for some types t1 and t2, though t1 and t2 may be complicated types in their own right). As such, all functions in SML can only take in one input - though the input type t1 may be one that "contains" multiple values. For instance, a function may have type int * int -> bool. For such a function, it takes in only one argument (a tuple containing two integers).

Functions in SML

We can declare a function with the fun keyword.

fun fact (0 : int) : int = 1
  | fact (n : int) : int = n * fact (n - 1)

The above example serves to initialize a function that computes the factorial function, and then bind it to the identifier fact. Function declarations create a closure which includes all bound variables in the scope of the function when it was declared, so the behavior of fact will always be as if it was in the same environment as when it was first declared. As such, we can also declare functions such as:

val x = 1
fun addX (n : int) : int = n + x
val x = 2

When addX is bound, it is bound in a closure that includes the binding of the value 1 to the identifier x (we may denote this as [1/x]). As such, even though the body of addX refers to the identifier x, it is not affected by the later re-binding of the value of x, since it only matters what the value of x was when addX was first bound. Seen in this way, then, reasoning about functions which use bound variables is very intuitive - you simply have to look up for the most recent time that that variable was bound.

We also can use anonymous lambda expressions to bind functions. These are denoted by the fn keyword, and are called lambda expressions for historical reasons having to do with a model of computation called the lambda calculus. For instance, we can declare:

val addOne : int -> int = fn x => x + 1

Lambda expressions can also be multi-clausal, by pattern matching to multiple different clauses. For instance, we can define the following function, which simply returns true when given 0 and 1, and false otherwise.

val isBinary : int -> bool = fn 0 => true | 1 => true | _ => false

Note that the right hand side of this declaration is an expression in its own right, and can be used independently of just being bound. The above binding simply binds the anonymous lambda expression (which simply increments an integer) to the identifier addOne. We could also do the following binding:

val two : int = (fn x => x + 1) 1

where we bind the result of evaluating the expression (fn x => x + 1) 1 to the identifier two. Clearly, this expression evaluates to 2, as 1 is substituted in for the local variable x, and then simply summed with 1.

NOTE: fun and fn differ in that functions declared with fun can be recursive, whereas val bindings using fn cannot. As such, while we can define the function fact as we did above, using fun, the following code does not work:

(* DOES NOT WORK *)
val fact : int -> int = fn 0 => 1 | n => n * fact (n - 1)

We will explore later on in the course what lambda expressions are useful for. In the meantime, usage of fun is sufficient to declare any functions that you may need.

It is also important to note that SML is an eager language, or call-by-value. This means that functions evaluate their arguments before stepping into their function bodies. This is explored more in the article on evaluation.

List

By Brandon Wu, May 2020

Lists are the SML type of ordered collections of objects. Notably, you can create lists of any kind of object, so int list, string list, and bool list are all valid types. list on its own is a type constructor (meaning that it makes new types out of old types), so it is not a valid type by itself, however. Lists are not, however, the same as arrays - they do not have constant-time access to any given index of the list. Indeed, they are best thought of as not analogous to arrays in other languages at all. Instead, you only have access to the elements located at the very beginning of the list, the head. Additionally, lists are more restricted than data structures in some other languages - a given list has a fixed type for its elements. All of the elements in a list must be of the same type. For the purposes of this document, we will discuss only int lists.

Values

We write lists as a sequence of integers, separated by commas, all enclosed with two square brackets. So then we have valid int lists as [1, 2, 3], [1, 5, 1, 5, 0], and [], with the latter representing the empty list. We also refer to the empty list as nil (in addition, you can type nil instead of [] in code).

The other essential component to lists is what is known as the :: operator, referred to as "cons". Cons can be used as a constructor for any fixed type of list, so we say that for any given type t:

(op ::) : t * t list -> t list

Cons takes in a value v of type t and t list, and prepends v to the front of the given list. For ints in particular, we have that 1 :: [2, 3] steps to [1, 2, 3]. Additionally, cons is right-associative. This means that in a continuous stream of applications of cons, they are evaluated from right-to-left. This means that 1 :: 2 :: 3 :: [] is implicitly denoting 1 :: (2 :: (3 :: [])), as the calls to cons associate to the right. So, similarly to before, a 1 :: [2, 3], [1, 2, 3], and 1 :: 2 :: 3 :: [] are exactly equivalent, and denote the same list.

Cons is also a constructor, meaning that it can be used to pattern match and deconstruct lists. As such, we can write basic functions to compute the length of a list as follows:

(* length : int list -> int *)
(* REQUIRES: true *)
(* ENSURES: length L ==> the length of L *)
fun length ([] : int list) : int = 0
  | length (x::xs : int list) : int = 1 + length xs

Given a non-empty list, this function simply binds the first element of the list to the identifier x, discards it, and then recursively calls length to find the length of the remaining list xs, adding 1 to the result.

The definition of an int list thus corresponds to:

datatype int list = [] | :: of int * int list

where an int list can either be the empty list [], or it can be :: of a first element and the rest of the list (where :: is an infix operator, so instead of being written as ::(x, xs), we have x :: xs). Note that this is not actually valid syntax, but you can think of the definition of int lists in this way.

Motivation

Compared to other data structures, lists seem to have numerous disadvantages. As mentioned previously, they do not possess constant-time indexing like arrays in other languages - there is no way to instantly get the ith element of a list easily. Instead, you must "cons off" all the elements in front of that item in order to retrieve it - if you want to remove that item from the list, then you have to put the preceding elements back as well (and in the right order!). Lists are also inherently sequential - you cannot access multiple elements at one time.

The reason why we choose lists is that they have very nice mathematical properties. These "disadvantages" in the previous paragraph become strengths, when viewed in a certain manner. Lists are powerful for their simple, inductive definition (as shown in the previous section), which is sufficiently powerful to characterize many important principles in this class. Additionally, they are persistent, meaning that they cannot be mutated - any change to a list simply creates a new one instead, which is a very desirable property to have in functional programs. Though the interfacing behavior with these lists is limited, we will write programs where this limitation matters less. Our hope is that, throughout this course, you can begin to see that there is an elegance in simplicity.

[Case Study: Sorted Lists]

Sorting is an important principle in computer science. Whether it's binary search trees or cataloguing data, sorting is a very prevalent concept when it comes to making algorithms more efficient. It's not always the most easy to reason about, however - how would you be able to formally prove that a sorting algorithm works? In this regard, lists turn out to have some very nice properties.

Definition : Sorted Int Lists

  1. [] : int list is sorted.
  2. The singleton int list is sorted.
  3. If L : int list is sorted, then if x : int is less than or equal to the first element of L, then x :: L is sorted.

This definition is naturally inductive, and follows very easily from the definition of sorting. In addition, it goes hand-in-hand with how we define lists - building them up from smaller parts one-by-one. Seen in this way, reasoning about and proving whether a list is sorted becomes very easy.

Combination

We have seen that cons is essential for constructing lists, and for deconstructing the constituent elements that comprise a given list. What about when dealing with multiple lists? We might want to combine the elements from several lists at once. A standard function for doing so is called the @ operator, or "append".

infix @
fun ([] : int list) @ (R : int list) : int list = R
  | (l::ls : int list) @ (R : int list) : int list = l :: (ls @ R)

(Note that this is valid syntax to declare an infix function @, it just looks a little different than what we've seen thus far. In this case, we put the function name between the arguments).

This function essentially just takes off all the elements from the left list, then begins to add them back onto the right list. It is also infix, which means that the result of [1, 2] @ [3, 4] is [1, 2, 3, 4], and in the function's code, ls @ R just means the resulting list from appending ls to R.

Questions to Consider

  1. In the code above, why does @ not reverse the left list?

  2. How might you inductively define a list whose elements all satisfy some property P?

  3. Write an SML function that reverses a list.

Option

By Jacob Neumann, June 2021

option is a SML datatype for handling potential undefined values. Formally, the option type is parametrized by a single polymorphic type variable, so for every SML type t, there is a type t option.

The type t option represents the construction of "either a value of t, or nothing". For instance, a value of type bool option is either true, false, or neither. A value of type int option is either some integer, or none. A function will return t option to represent the possibility that no acceptable value of type t can be identified to return, and have a value to return to signal this circumstance. See the "Options-as-partiality" HOFs section below for some functions elaborating on this understanding of options.

Options also function as a kind of "container": a value x : t option either "contains" a value of type t (x = SOME(z)) or is an "empty container" with no values (x = NONE). We can therefore view t option as a degenerate version of t list, where the "list" is constrained to be length at most one. The notion of a "container" is made precise in functional programming with the idea of a "monad". In Haskell (which makes much more explicit use of monads), options are known as the "Maybe monad".

Values

The type t option has exactly one more value than the type t itself. Formally,

datatype 'a option = NONE | SOME of 'a

So for each value v : t, SOME(v) : t option. And NONE is a value of type t option for each type t, as required by the surrounding context. Options can be nested, e.g. SOME(SOME(3)) : int option option.

SOME and NONE are the two constructors of the t option type, so, in addition to the constructs generally available in pattern matching (e.g. wildcards and identifier binding), we can pattern match against SOME and NONE.

fun defaultToThree (NONE : int option):int = 3
  | defaultToThree (SOME x) = x

val 2 = defaultToThree(SOME 2)
val 3 = defaultToThree(NONE)

fun searchForEven [] = NONE
  | searchForEven (x::xs) = if (x mod 2)=0 then SOME(x) else searchForEven xs

val (SOME _) = searchForEven [1,2,3,4]
val NONE = searchForEven [1,3,5]

If t is pretty-printed by the smlnj REPL (like int,bool,string list, etc.), so too is t option. This is demonstrated by the following smlnj REPL snippet.

- val k = SOME(SOME 5);
val k = SOME (SOME 5) : int option option

Production

There are some basic SML functions which produce options:

    Int.fromString : string -> int option
    Bool.fromString : string -> bool option

Both of these functions are partial inverses to their respective toString functions (e.g. Int.fromString(Int.toString(7)) == SOME 7), but return an option so they can return NONE on strings which do not encode an int or bool, respectively.

Elimination

Another option for casing on options is the function (provided in the Option structure - see below)

    Option.getOpt : 'a option * 'a -> 'a

which behaves as follows: Option.getOpt(SOME x,y) will evaluate to x, and Option.getOpt(NONE,y) will evaluate to y. When writing functions operating on options, it is still generally preferable that you use clausal pattern matching to break into the "SOME case" and "NONE case", but there are situations where Option.getOpt is an elegant solution.

The Option structure also provides the "join" function

    Option.join : 'a option option -> 'a option

which sends SOME(X) to X and NONE to NONE.

From the Structure

The Option structure (part of the SMLNJ basis) provides a number of useful utilities for working with options.

In addition to the datatype option itself being available at top-level, the exception

    exception Option

is available at top-level as well.

Basic Functions

The Option structure provides boolean-valued functions for detecting whether a given option value is NONE or SOME, and for extracting values from SOME.

    Option.isSome : 'a option -> bool
    Option.isNone : 'a option -> bool
    Option.valOf : 'a option -> 'a

Option.valOf NONE raises the exception Option. NOTE: Do not use these fuctions in place of pattern-matching on an option value. Expressions like if Option.isSome(X) then Option.valOf(X) else e2 are bad style.

"Options-as-containers" HOFs

Options are an instance of a more general structure in functional programming known as a monad. Accordingly, there are a number of higher-order functions which we can define on options. In particular, option comes equipped with a map operation:

    Option.map : ('a -> 'b) -> 'a option -> 'b option

which does what you might expect, given its type: Option.map f (SOME x) evaluates to SOME(f(x)), and Option.map f NONE produces NONE. As usual with "map" functions, we generally require f to be total.

Options, as a "container" of data, also admit a filtering operation:

    Option.filter : ('a -> bool) -> 'a option -> 'a option

Which is implemented as

fun filter p = fn NONE => NONE | (SOME x) => if p(x) then SOME x else NONE

i.e. it "filters" out the value x if x does not "satisfy" p (p(x) == false). As usual with "filter" functions, we generally require p to be total.

The Option structure also provides a utility for "folding" an option:

    Option.fold : ('a * 'b -> 'b) -> 'b -> 'a option -> 'b

which we might implement as

fun fold g z NONE = z
  | fold g z (SOME x) = g(x,z)

We often assume that g is total. In certain situations, we also want the two arguments of g to be the same type, and might assume that g is "associative" (in the sense that g(x,g(y,z)) == g(g(x,y),z) for all x,y,z) or that z is a "unit" for g (i.e. g(x,z) == x == g(z,x) for all x).

"Options-as-partiality" HOFs

As mentioned above, options provide a way to represent partial functions: a function f : t1 -> t2 option can be thought of as a function which is defined on some inputs (f(x) == SOME(z)) and not on others (f(y) == NONE). The Option structure provides several utilities consistent with this interpretation. First is composition of these "partial functions".

    Option.compose : ('a -> 'c) -> ('b -> 'a option) -> 'b -> 'c option
    Option.composePartial : ('a -> 'c option) -> ('b -> 'a option) -> 'b -> 'c option

So (Option.compose (g,f))(x) will return NONE if f(x) == NONE, and will return SOME(g(y)) if f(x) == SOME(y). Option.composePartial will behave similarly, except it will return g(y) in the f(x) == SOME(y) case, since g returns an option.

It also provides a "partial" version of its map function:

    Option.mapPartial : ('a -> 'b option) -> 'a option -> 'b option

Signatures

SMLNJ Basis Documentation

By Jacob Neumann, June 2021

This page collects some documentation of the SML Basis Library produced as part of the development of the Auxiliary Library. The documents linked below provide documentation of some (but definitely not all) of the SML Basis modules. Unless otherwise stated, the documentation is of the version of the basis implemented in SMLNJ v110.99.

This documentation is meant as a more accessible alternate to the more official documentation. In places, it also includes explicit mathematical definitions, proofs by induction, asymptotic analyses, and evaluation traces.

Fn

The Fn module provides basic combinators, including some of the fundamental operations in lambda calculus. This includes the composition operator, o, which is available at top-level.

Documentation

Auxiliary Library Documentation

By Jacob Neumann, June 2021

This page collects the documentation of the Auxiliary Library. The purpose of this library & its documentation is to serve as sample code, proofs, and worked examples for students learning functional programming and/or SML for the first time. Accordingly, this documentation also includes explicit mathematical definitions, proofs by induction, asymptotic analyses, and evaluation traces, as appropriate.

Unless otherwise stated, the documentation assumes SMLNJ v110.99. Some features (e.g. modifications of the pretty printer) might not work in other versions of SMLNJ.

All the documents & code here should be considered "work-in-progress". If you spot an error in either, you can report it here.

Trees

Provides polymorphic binary trees in a structure Tree, with a couple basic methods for working with them.

In particular, this includes the functions inord and foldr, which are used in the OrdTreeSet functor (see Sets below) and critical to the associated representation independence result.

Code

Timing

The Timing module includes types for encoding years, months, days, times, time zones, etc., as well as numerous utilities for working with them. This module primarily serves as an extended example of how to use custom SML datatypes to encode data, and how to take advantage of pattern matching to write elegant code. This module includes some imperative features (achieved utilizing basis modules unique to SMLNJ), such as stopwatches, countdown timers, and functions which obtain the current time.

Code

Permute

The Permute module contains utilities for permutating and sorting lists. The functions in this module are polymorphic, and some of the sorting functions furthermore serve as examples of currying.

Documentation -- Code

CPS Iterate

The CPSIterate module allows for imperative-programming-esque loops, but defined entirely functionally and entirely in continuation passing style.

Documentation -- Code

Language

The Language module provides combinators for working with "languages": lists of values of some equality type Sigma. Connects to some of the classic theory of computation, as well as providing sufficient combinators to capture a fragment of the logic of regular expressions. Good showcase of Higher-Order Functions & Combinators.

Documentation -- Code

Regular Expressions

The Regexp module implements regular expressions in Standard ML. Parametrizes over an "alphabet" (equality) type Sigma, and implements a type ''Sigma regexp with a CPS/Exn-control-flow function match which performs regular expression matching. Includes a method for obtaining the language of a regular expression, implemented using the Language module (above).

Requires: Language.sml

Documentation -- Code

Sets

An implementation of sets in Standard ML. Includes EQ & ORD typeclasses, the SET signature, and three implementations: ListSet (sets are unordered, duplicate free lists), OrdListSet (sorted, duplicate-free lists), and OrdTreeSet (sorted, duplicate-free trees). The latter two are equivalent (as proven by a representation independence proof), but the superior time bounds of OrdTreeSet are not realized unless we can maintain a balance invariant -- providing motivation for red-black trees.

Requires: Tree.sml

sig -- struct

Debugging

Common Errors

By Eunice Chen, January 2021. Revised February 2021

In this section, we will explore some common mistakes and errors that may occur when programming in Standard ML.

Casing Issues

Match Nonexhaustive

Though the code will still compile and run with this warning, it often indicates that the logic in your code does not cover all cases. It is always a good idea to check if you are missing some cases in your thinking, or if the warning comes from a scenario that will never happen according to the function specification. Even if there is a case that may not happen according to the function specification, it may be useful to raise an exception in the "impossible" case to get rid of this warning, because if your code somehow violates the specification and ends up in this case, then you can catch the error and debug it.

Nested Cases

When you nest case expressions within case expressions, it's good to wrap your case statements with parentheses. SML will continue to look for patterns to case on, so using parentheses will let it know when to "stop".

For example, the following code will compile without warnings:

case 0 of
     0 => (case "x" of _ => 3)
   | _ => 5

but this code will have a match redundant error and a match nonexhaustive error:

case 0 of
     0 => case "x" of _ => 3
   | _ => 5

Obviously, this example is very contrived, but adding parens may help in nested case expressions.

Associativity Issues

Since function application is left associative, making sure you have correct parenthesization is very important. It is almost always a good idea to double check your parenthesization in your code, since it can cause very confusing bugs, but can be fixed with a simple check.

For example, the following code will not compile:

fun f [] = 0
  | f x::xs = 1

This fails to compile because function application is left-associative, so the SML compiler thinks that x is the only argument to the function f, and does not know how to parse the extra ::xs. A fix for this issue would just to put parens around the x::xs, like so:

fun f [] = 0
  | f (x::xs) = 1

Similarly, if some expression is an argument to another function, it is usually good to put parens around that expression. For example, suppose we had some functions f and g of type int -> int, and we would like to apply the function g to the value f 1. Then, if we wrote g f 1, this would not typecheck, since function application is left-associative (writing g f 1 would be the same thing as writing ((g f) 1)). To fix this, we would write this as g (f 1) to fix the associativity issues.

Equality Type Warnings (i.e. ''a vs 'a)

Sometimes, code will fail to typecheck because it expects something of type 'a but instead gets something of type ''a instead. A plain type variable like 'a can be substituted with any type, but something like type ''a (with two apostrophes in front) can only be substituted with an equality type. An equality type is a type that can use operators like = and <> to compare their values (int, string, and bool are good examples of equality types). Thus, if your code has an ''a instead of an a, it is likely that you are using = or <> to compare values.

For example, the following function has type ''a list * 'a -> bool:

fun contains (x, []) = false
  | contains (x, y::ys) = x = y orelse contains (x, ys)

Because x and y are compared by = in the function, then any inputs into the contains function must consist of equality types, so the type is ''a list * ''a -> bool.

However, we might want a function that takes in an 'a list instead because it is more general. We still need some way of comparing whether two elements are equal or not, so we will pass in an extra parameter to compare two elements. We can rewrite the function as follows:

fun contains (cmp, x, []) = false
  | contains (cmp, x, y::ys) =
    case cmp (x, y) of
      EQUAL => true
    | _ => contains (cmp, x, ys)

The type of the function will now be ('a * 'a -> order) * 'a list * 'a -> bool.

In general, it's preferable to use case expressions instead of if-then-else statements with = in the condition.

?.t Type Errors

errors.sml:9.7-10.18 Error: right-hand-side of clause does not agree with function result type [tycon mismatch]
  expression:  ?.t
  result type:  ?.t

Clearly, this is not a very helpful error message. The code that induces this error is shown below.

signature T =
sig
  type t val x : t
end

functor Foo (structure A : T
             structure B : T) =
struct
  fun bar (0 : int) = A.x
    | bar n = B.x
end

The reason for this error is because the compiler does not distinguish (in terms of name) structures that are given as argument to the functor Foo. Both are referred to by ?. In other words, it is saying that, on the second line of bar, it expected a value of type A.t to be returned, but received one of type B.t, namely B.x. Since A and B do not really have names, however, it just referred to either structure as ?, causing the confusing error.

If you receive this error, check your structures to make sure that you are not conflating types from different structures.

Re-declaring Datatypes

Some datatypes are already present in the SML Basis Library, meaning that you do not have to declare them, as they are already present at the top level. Re-declaring datatypes anyways, however, can cause type issues that are somewhat difficult to debug. Consider the following code:

fun opt_wrap x = SOME x

datatype 'a option = NONE | SOME of 'a

fun wrap_again (x : int) : int option = opt_wrap x

Although it looks innocuous, we will run into the following type error:

errors.sml:5.5-5.51 Error: right-hand-side of clause does not agree with function result type [tycon mismatch]
  expression:  int ?.Assembly.option
  result type:  int option

The reason for this is because the declaration of 'a option on the third line creates a new type 'a option that "shadows" the basis' definition. This means that on line 1, SOME is of the type of the original option, whereas the function wrap_again is type-annotated to expect a value of the type of the new option.

In general, re-declaring datatypes is a bad idea that will cause conflicts down the road, as it makes your code incompatible with any other code expecting the original datatype, such as autograders. Be sure to not re-declare datatypes that already exist to avoid this issue.

Forgetting a Bar

Recall that a bar | is required to delimit different clauses in a function definition. For instance, take the following (incorrect) implementation of the fact function:

fun fact (0 : int) : int = 1
    fact n = n * fact (n - 1)

Compiling this will result in the error:

errors.sml:2.24 Error: unbound variable or constructor: n
errors.sml:2.14 Error: unbound variable or constructor: n
errors.sml:2.10 Error: unbound variable or constructor: n
errors.sml:1.29-2.30 Error: operator is not a function [overload - bad instantiation]
  operator: 'Z[INT]
  in expression:
    1 fact
errors.sml:1.6-2.30 Error: right-hand-side of clause does not agree with function result type [tycon mismatch]
  expression:  bool
  result type:  int
  in declaration:
    fact =
      (fn 0 : int =>
            (1 fact) <errorvar> = <errorvar> * fact (<errorvar> - 1): int)

If you see unbound variable or constructor errors where they don't make sense, and should not be unbound, it may be the case that you are having a deeper syntax issue!

Let in without an end

Consider the following code snippet:

fun foo x =
  let
    val y =
      let
        val z = 3
      in
  in
    4
  end

Compiling this will result in the error:

errors.sml:6.7-6.9 Error: syntax error: replacing  IN with  SEMICOLON
errors.sml:11.1 Error: syntax error found at EOF

uncaught exception Compile [Compile: "syntax error"]
  raised at: ../compiler/Parse/main/smlfile.sml:19.24-19.46
             ../compiler/TopLevel/interact/evalloop.sml:45.54
             ../compiler/TopLevel/interact/evalloop.sml:306.20-306.23

due to the missing end at the end of the inner let. In general, an error that says "replacing" is trying to signal that you have probably put an unexpected form of syntax where it shouldn't be - look at precisely what is missing to determine what the error is.

Debugging Hints and Strategies

By Eunice Chen, January 2021

Debugging is often one of the hardest and most time-consuming parts of coding. Don't be frustrated if it takes longer than the actual coding at times; that is all part of the process! Luckily, SML is a type-safe language, so most errors are compilation errors (code does not typecheck) or errors in the logic of the code itself.

Debugging Compilation Errors

1. Get a syntax highlighter

Syntax highlighting is tremendously helpful from both a debugging and readability standpoint. Also, finding typos and syntax errors is much easier with syntax highlighting, simply because the code is much easier to read. Many code editors have syntax highlighting plugins for SML. (For example, in VSCode, there is an extension providing SML language support).

2. Anatomy of an error message

Instructions
Fig 1. Sample error message with a brief explanation of each component of the message.

For more specific error messages, it may help to consult the errors page of this website.

3. Explicitly Type Annotate

If things do not quite typecheck, one thing that often helps is explicitly type annotating each value used within the function. Sometimes, the type error looks like it occurs on one line, but the real issue may have occurred earlier in the code. Thus, type annotation often helps pinpoint the source of the error, since the explicitly annotated type will fail on whichever line the expected (type-annotated) type does not match the actual type. However, make sure that, in explicitly type-annotating, you do not accidentally restrict the generality of the types (i.e. putting an int where you should have an 'a, which will work in the short term because int is more specific than 'a, but may fail later when something of type 'a is needed). This can cause more errors later down the line, even if they do work immediately.

Debugging Code Logic

1. Start with small test cases

First, you will want to write simple, small test cases, to test the most basic parts of your function. Identify the simplest combinations of inputs for your function, and test the behavior of the function on these small inputs. If you are working with lists, for example, you may want to write test cases involving [] and the singleton list, to test the [] and x::xs patterns, respectively. If you are working with ints, often the simplest inputs to your function may be 0 and 1.

The basic structure of a test is as follows:

Suppose we are trying to debug function f : int -> int, and we expect f 1 to evaluate to 0. Then, we would write our test case as

val 0 = f 1

If f 1 does not evaluate to 0, then the SML compiler will raise the exception Bind when this code is run, since it will be unable to bind the result of f 1 to the value 0.

2. Write more comprehensive tests

Once you have tested some of the most basic functions, you should write more comprehensive tests. Typically, this involves writing a test for each clause in the code. However, sometimes writing a test for each clause in the code is not enough, since the function may still evaluate to an incorrect value on more complicated outputs. A good rule of thumb is that you should, at minimum, write tests for each function clause, but you will probably have to write more tests depending on the situation.

3. Debug "Bottom Up"

Often, your code will either call helper functions, or embed smaller functions within larger ones using let and local keywords. When debugging, first ensure these smaller, lower-level functions work before debugging top-level functions, as ensuring the correctness of these lower-level functions allows you to rely on their correctness when debugging top-level functions.

4. Step through code

Often, simply just walking through each step in the code to ensure it evaluates in the way you expect is useful. Sometimes, it may help to print out values after a certain step in your code. However, print statements should not be your first choice in debugging, as printing has some limitations. Because the SML print function has the type string -> (), you can only print out strings. (Note: the print function causes side effects, i.e. the printing of its input in the REPL, which is why it evaluates to a unit. However, if you haven't learned what side effects are, don't worry- just know that the print function returns ()). In order to print out values that are not strings, you have to convert the value to a string, either by using an SML library function (like Int.toString), or by writing your own toString function.

Suppose we want to see what some function f : int -> int evaluates to on the input 1. Then, we would write the following code:

val () = print (Int.toString (f 1))

Then, once the code is compiled and run, the result of f 1 will be printed out in the REPL.

If you decide to use print statements while stepping through code, you may have to add an extra let statement so that you can add the val declaration for printing.

In general, print statements in SML can be unwieldy and difficult to use, so print statements are generally not suggested as a way to debug. However, stepping through each line and clause of code is generally very helpful and highly encouraged.

Debugging Strategies

Rubber Duck Method

One useful method of debugging is called the "rubber duck method," in which the programmer explains their code, line by line, to a rubber duck (though any object, inanimate or animate, will work equally as well). The premise of this method is that, by explaining what the code is supposed to do and seeing what it actually does, any incongruity between the two becomes apparent and the error is found. Also, this method is helpful because, through explaining, you gain a much better understanding of the concept/code being explained.

Concepts

Basics

Standard ML is a functional programming language, meaning that we eschew the use of side effects and state changes to obtain programs that are easy to reason about, analogous to reasoning about mathematical expressions. Similarly to mathematics, we perform operations and view computation as a process of simplification (or reduction, as we will more commonly name it). Seen in this way, computation becomes an elaborate series of expression evaluations, and it is this concept that will permeate the course. In these notes, we will go further in detail about the basic concepts of SML.

Evaluation

By Brandon Wu, May 2020

Evaluation is a commonplace idea. No matter what programming language you are in, it is a customary concept to invoke subroutines in order to obtain some kind of final result, which can be further used in order to achieve some later goal. To obtain such a result, however, programs must perform certain computations and carry out certain steps - in other words, they must evaluate. Ultimately, programs are complicated constructs whose main goal is to compute some value or achieve some effect - we will focus mainly on the first case here.

Expressions and Values

Expressions in Standard ML are akin to mathematical expressions. They are built up from applications of certain operations, being subject to certain simplification rules that can be used to obtain a final answer. For instance, we would consider 2 + 2 an expression, similarly to other examples such as 1 div 0 and Int.toString 2.

The most fundamental building blocks in Standard ML are values. Values are the primordial units of a given type, being irreducible to any further simplified form. When trying to answer some computational problem, it is usually the case that we are looking for some kind of "answer". As such, values are important to obtain, as we are usually looking for some kind of answer in "simplest terms". Values in SML emcompass examples such as 2, true, "foo", [1, 2, 3] and fn x => x + 1.

[Value] A value is an expression e such that, for all e' such that e ==> e', e' = e. In other words, a value is an expression that only reduces to itself - there is no pending computation left to be done.

NOTE: The meaning of ==> in the above definition is reduction, which is further explained below.

A noteworthy distinction to make is that certain language constructs, such as an if-then-else expression, let-in-end expression, or case expression, are in fact expressions. This means that they can be passed around and evaluated just like any other expression. So for instance, the following code is a valid expression:

(let
    val x = 5
in 
    x
end) + 2

and has the value of 7. Similarly, the following code is also an expression:

(if true then 15 else 150) * 2

and has a value of 30.

Reduction

We now define a notion of reduction, which corresponds to our notion of simplification. We write that e ==> e' if the expression e reduces to the expression e', which means that e produces e' from zero or more applications of some simplifying rule. For instance, we may say that 2 + 2 ==> 4, since by applying the function (op +), we obtain 4. Furthermore, we may say that if true then "good" else "bad" ==> "good" by evaluation of the if-then-else expression, since the predicate (in this case true) is true.

NOTE: ==> is not valid SML code, it is simply our shorthand for the idea of reduction.

[Valuable] An expression e is valuable if there exists a value v such that e ==> v. Note that all values are by definition valuable.

So valuable expressions include 2 + 2, 4, and if true then 4 else 2 (and in fact all reduce to the same value!). An example of a non-valuable expression is 1 div 0, which raises an exception Div when evaluated (since division by zero is undefined). Additionally, if we consider the following code fragment:

fun loop (x : int) : int = loop x

This defines a function loop : int -> int that loops forever, since it continuously calls itself forever. Thus, loop x for any x : int is also a non-valuable expression, since it never reduces down to a value.

In fact, what we will see is that this behavior is sufficient to characterize all well-typed expressions. We summarize it in the following:

[Behavior of Well-Typed Expressions] For any well-typed expression e, it either:

  1. Reduces to a value

  2. Loops forever

  3. Raises an exception

Eager Evaluation

SML is an eagerly evaluated[1] language. This stands opposed to other paradigms such as lazy evaluation, which is exhibited in languages such as Haskell. In an eagerly evaluated language, we evaluate arguments of functions even if we may not need them. While this arguably may be "wasteful" in some cases, we will find that this greatly simplifies work/span analysis, among other benefits.

[Eager Evaluation] In an eagerly evaluated language, arguments of functions are evaluated before stepping into the body of a function. For a function f and valuable expression e, when evaluating the expression f e, first e is evaluated to obtain the value v such that e ==> v, then f v is evaluated.

As an example of this, consider the function fn x => x + 1. If we were to try and evaluate (fn x => x + 1) (2 * 3), first we would need to evaluate the function's arguments, that being 2 * 3. As such, this entire expression would reduce to (fn x => x + 1) 6, which is 7.

In an example like the previous one, it doesn't particularly matter where we evaluated 2 * 3 - we would have gotten the same result either way. This is not always the case. Consider the expression (fn x => 2) (1 div 0). By eager evaluation, we should evaluate the argument first, which means that this entire expression should raise an exception. Raising an exception thus happens before we even look at the body of the function. For all intents and purposes, the body of the function does not exist to us until we actually enter it - which necessitates that the argument to the function is valuable. It is a black box that is "locked" behind the argument.

Footnotes

[1]: In other languages, we may instead say call-by-value, which is a separate but closely related concept.

Extensional Equivalence

By Brandon Wu, June 2020

Up to this point, we have been using terms such as "equal" and "equivalent" rather casually. Such a notion of equality seems to be rather intuitive, at first glance, not necessarily in need of more exposition. After all, it seems to be our intuition that if two things are equal, we will "know it when we see it". When working in the realm of mathematical expressions, we have some nice assumptions that lend credence to such a hypothesis - if we are evaluating the cosine of a number, we don't expect that the cosine function might wait forever before returning an answer to us. When working with SML code, this becomes a valid concern, and requires that we have a more particular definition of equivalence.

Motivation

It is at this point that we will try to make the reasons for defining an idea of extensional equivalence apparent. It has everything to do with the extensional behavior of our programs - the outputs that it returns given certain inputs.

Extensional equivalence comes into play when reasoning about the correctness of programs. A large motive for why we write functional code is linked to an idea of a formal proof - when writing code without side effects, it is far easier to reason about the behavior of a program. We would like to be able to provide some rigorous argument as to why a program is correct.

In imperative settings, it is quite difficult to go about doing so - when problems of state are added in, a programmer now not only needs to think about what step a particular program is in at a given moment, but also what the state of the program is as well. By state, we usually mean the particular arrangement of mutable data in the program's environment, such as the state of memory allocation, variable contents, and other transient factors. The possible configurations of such a state are generally infinite, which makes reasoning rather more difficult. While one can generally be reasonably sure what the state of a program should be, it does not always work out that way, as any programmer who has written a bug can attest. In functional programs, we eschew side effects for ease in reasoning about what a program truly outputs. Functions have well-defined results.

Imperative programmers create functional programs. This is, at first, perhaps a surprising claim, but we will soon delineate what precisely we mean. In making this claim, we are only considering functions of a certain basic kind - that is, functions that compute, rather than create or modify. For instance, a program that generates the Fibonacci numbers, or computes the most efficient path through a given graph, or encodes some kind of data. We do not consider programs that, for instance, outputs to a file or prints the contents of the directory that the program file lives in. In addition, we speak only of programs that do not return random results, for instance random number generators or other random selectors.

These programs have the characteristic that they are (in the general case) deterministic. Given the same inputs, they should return the same output - regardless of whether the language that they were implemented in is imperative in nature. So for these varieties of programs (which are very common computational problems), imperative programs make use of state to achieve a program that is functional in nature - that is, having no overall visible effect on the machine's state. Functional programming simply uses functional tools to achieve the same result, in the end.

This is an important idea to cognize because it forms the basis for our concept of correctness. We identify a program's correctness by its ability to return the expected results when given some input. For instance, we know a factorial function that returns 5 upon being given 3 is no good. Having functional components helps us easily reason and build up functional results - that is, an end program that returns the correct results.

As we will see, extensional equivalence will be a powerful tool when reasoning about the correctness of our code. It lets us prove in a mathematical sense whether functions are correct, as well as abstract between specific implementations. We will go further into detail on this later in the chapter.

The Definition

We will declare this definition of extensional equivalence for non-function expressions first, and then explore the function case later.

[Extensional Equivalence (Non-Functions)] We say that two expressions e : t and e' : t for some type t are extensionally equivalent if they exhibit any of these three behaviors:

  1. Reduce to the same value

  2. Loop forever

  3. Raise the same exception

We will write e == e' to denote this.

NOTE: == is not valid SML code.

As such, clearly expressions such as 2 + 2 and 3 + 1 should be extensionally equivalent, since they reduce to the same value, that being 4. It is important to note that reduction is a stronger relation than extensional equivalence - if one expression reduces to another, then they must by definition be extensionally equivalent. However, extensional equivalence does not imply reduction. For instance, 4 does not reduce to 2 + 2, since clearly 4 is already in its most simplified form. This corresponds to our intuition about traditional mathematical expressions, as we would expect that we could say that \( -1 \) and \( \cos(\pi) \) are equal, since they have the same value.

With the second point, however, we depart from our normal mathematical reasoning. It is generally the case that, given a mathematical expression, we expect to be able to generate a result from it. That result may be undefined, and it may take some time, but in many cases we don't expect to have to account for a nonterminating computation. This is not the case in Standard ML. Now, any given function call may loop forever, which plainly is problematic for equivalence.

As such, we will add a stipulation that two expressions (of the same type) that loop forever are extensionally equivalent. Note that even though two expressions of dissimilar types that loop forever have the same "behavior", they cannot be extensionally equivalent by definition, as only expressions of the same type can be extensionally equivalent.

The third case also arises from our usage of Standard ML, as expressions can potentially raise exceptions in cases where computation is not feasible. Some basic kinds of exceptions include Div, Bind, and Match. The precise mechanism of exceptions is not important, since we will cover that in more detail later on, but in order to maintain some notion of equivalence, we will simply require that two same-typed expressions raise the same exception. As such, expressions like 1 div 0 and 2 div 0 are extensionally equivalent, whereas 1 div 0 and let val [] = [1] in () end are not.

Note: It is not important to know specifically what the latter expression in the last example is doing - just know that it raises the exception Bind.

In writing these definitions we have made a step towards cognizing extensional equivalence, but in a sense we have taken a step back as well. We have committed the same mistake by saying extensional equivalence holds when expressions "reduce to the same value". We required this more stringent definition of extensional equivalence specifically because of callously using terms like "equal" and "equivalent", and now we've so unthinkingly used the term "same". But what does it mean for two values to be the "same"?

For many cases, our intuition will suffice. It is clear to say that 2 is the same as 2, and 2 is not the same as "two" (for any self-respecting programming language, in any case). For one, the types must be the same, and definitely one integer value should not be "equal" to any other separate integer value. We might then claim that two values are only "equivalent" if they denote the same literal value.

NOTE: The perceptive reader may notice that again, we have used the words "same" and "separate". Unfortunately, there is only so far that we can dig this rabbit hole - at this point, we will have to rely on our intuitions to tell us that 2 is indeed the same value as 2, and not the same value as 3 or 4, and call it a day.

This definition will suffice for most types. For functions, however, we will have to take a different approach.

[Extensional Equivalence (Functions)] We say two expressions e : t1 -> t2 and e' : t1 -> t2 for some types t1 and t2 are extensionally equivalent if for all values x : t1, e x\( \cong \)e' x.

We see that this definition simply takes our previous definition and moves it one step up. There is an interesting aspect of this rule that depends on a concept that we have yet to learn, but we will cover that when we get there. Seen in this way, we can say that two function values that are not the same literal value may be extensionally equivalent, as their extensional behavior (that is, their behavior when interacting with other objects) may be the same.

More concretely, let us consider the example of fn x => x + x and fn x => 2 * x. Recall from the previous chapter that fn x => x + x in particular does not simplify to fn x => 2 * x, and is in fact itself a value - meaning that it is in its terminal form, being irreducible to anything other than itself. The right hand side of any lambda expression is, in a sense, frozen until the function is given an argument. So then fn x => x + x and fn x => 2 * x are different values, however it is obvious to see that on being given an input they evaluate to extensionally equivalent values (specifically, the same integer).

As discussed before, our definition of "equivalence" identifies functions with the values that they output. It should feel intuitively clear - two functions are exactly equivalent if they return the same output for the same inputs. This seems to be the definition of computing the "same" function. Notice that we omit any mention of complexity or how the function goes about computing what it computes - no matter what path is taken, all that is important is what values are ultimately outputted.

Referential Transparency

In this section we will introduce a powerful idea called referential transparency, which follows as a direct consequence of our definition of extensional equivalence.

[Referential Transparency] Consider an expression e that contains the expression e1 as a sub-expression. For any expression e2\( \cong \)e1, we can produce the expression e' as the same expression as e, but with each sub-expression e1 replaced with e2, and we will have e\( \cong \)e'. In words, for an expression e that contains e1, we can swap out e1 for an extensionally equivalent e2 to obtain an expression extensionally equivalent to e.

NOTE: The notion of a "sub-expression" here is not very well defined - we will use our intuition here, similarly with what it means to be the "same expression". Gaining an intuition through examples will suffice.

To illustrate this, we might say that the expression 4 * (2 + 2) has the sub-expression 2 + 2, and that let val x = f (1 div 0) in x end has (1 div 0) as a sub-expression. In the former case, we can use referential transparency to say that 4 * (2 + 2)\( \cong \)4 * 4 and let val x = f (1 div 0) in x end\( \cong \)let val x = f (2 div 0) in x end, by replacing the aforementioned sub-expressions with 4 and 2 div 0, respectively.

Referential transparency will let us abstract away from the specific makeup of a certain implementation or expression, instead replacing it as we see fit with something known to be extensionally equivalent, while still allowing us to maintain extensional equivalence. This comes in handy when proving that an implementation of a particular function is correct, as we can simply prove that it is extensionally equivalent to an existing, simpler implementation that is already known to be correct. This has consequences for simplifying and optimizing implementations.

Later on, we will explore specifically how we may go about proving extensional equivalence. This will primarily take the form of mathematical or structural induction.

Stepping with Valuable Expressions

Recall from the previous article that SML uses eager evaluation. In other words, the parameters of a function are evaluated before stepping into the function.

When proving extensional equivalence theorems about SML code, though, we often end up in a situation where we want to step into a function, but the function is being applied to an expression that isn't a value.

We will look at an example of this situation, involving the functions @ (append) and rev. The definition of @ is as follows:

fun [] @ L = L
  | (x::xs) @ L = x::(xs@L)

Now, suppose we want to show the following theorem:

(x::xs) @ (rev A) \( \cong \) x::(xs @ (rev A))

where x : int, xs : int list, and A : int list are all values. First, make sure this "feels right" -- the left side of our theorem matches the second clause of @ with rev A bound to L.

However, notice that rev A is not a value. Since SML is eager, we cannot step into the function @ until we evaluate the expression rev A. In other words,

\[ (x::xs) @ (rev A) \not\Longrightarrow x::(xs @ (rev A)) \]

So, we're stuck! D:

... or are we?

Let's assume that rev A is valuable, i.e. it evaluates to a value, and let's give that value a name-- say, v : int list: \[ rev A \Longrightarrow v \] With this value in hand, we can do what we wanted to do!

\[ (x::xs) @ (rev A) \Longrightarrow (x::xs) @ v \Longrightarrow x::(xs @ v) \]

Notice that this complies with SML's eager evaluation, since we are fully evaluating the parameters of @ before stepping into the function.

And here's the kicker: we can also use v to evaluate the right hand side of our theorem!

\[ x::(xs @ (rev A)) \Longrightarrow x::(xs @ v) \]

Again, this complies with SML's eager evaluation-- in this case, we never even step into the definition of @. But, we've actually proven our theorem! We showed that the LHS and the RHS both evaluate to the same expression, so by rule 1 of the definition of \( \cong \), the LHS and RHS must be extensionally equivalent. We are done!

In this example, we got around SML's eager evaluation by assuming that our parameter rev A is valuable, and as it turns out, this concept holds in the general case. If we know the parameter of a function is valuable, then we can step into the function without first evaluating that parameter. This principle, which we will call (for lack of a better term) "stepping with valuable expressions," is one reason why valuable expressions are so important.

[Caution!] When stepping with values, we can use the reduction relation \( \Longrightarrow \). When stepping with valuable expressions, this is not always true (it certainly is not true in the example above). Stepping with valuable expressions only preserves extensional equivalence \( \cong \).

Totality

As stated previously, frequently we will write proofs demonstrating the extensional equivalence of two expressions. In order to do so, we often will have to expand definitions, stepping through function bodies and applying lemmas. In doing so, we will frequently need to do a totality citation, to justify that making such steps is truly valid. While the name may seem unfamiliar, it ultimately belies a concept that you already know - valuability.

[Total] We say that a function f : t1 -> t2 for some types t1 and t2 is total if for all valuable expressions x : t1, f x is valuable. In other words, for all valuable inputs to f, we get a valuable output.

Examples of total functions include fn x => x + 1, fn x => "foo", and the length function for lists. Notably, however, the factorial function is not total:

fun fact (0 : int) : int = 1
  | fact (n : int) : int = n * fact (n - 1)

This function is not total because, while fact n is valuable for all non-negative n, fact n for a negative n loops forever, as it decrements infinitely, never finding a base case. Thus, fact is not total. Sometimes if a function is valuable only on certain inputs, then we will say that a function is "total over" some domain, even if it is not total in general. We may say that fact is thus total over the non-negative integers, though this is not a common practice.

We will now digress slightly to consider an example. What can we say of the behavior of a function fn x => 15150? Well, we can characterize its behavior in words - it returns 15150 on all inputs. We must be careful when linking our intuition regarding this function to the actual evaluational semantics of SML - however. Consider the expression (fn x => 15150) (1 div 0). If we went along with our previous conclusion, we might say that this is 15150. This contradicts what we learned in the previous section, however. Since SML is eagerly evaluated, this expression should raise an exception and never reach the function body at all.

This is an easy mistake to catch when considering an explicit, definite input like 1 div 0, but oftentimes we will be considering inputs that might be "unspecified" in some sense. They may be the result of computations that we are not fully convinced of, which could return an expression with any kind of behavior. Suppose we were wondering the behavior of (fn x => 15150) (f y), for some function f and value y. Now, we aren't sure at all if this expression does what we think it might do, which is return 15150 - that depends entirely on the definition of f and the value of y.

This is where totality comes in. Totality is oftentimes like a sledgehammer, being far more brutish and exhaustive than its use cases necessitate. It is undeniably useful, however. With totality, we do not have to reason about the behavior of a function on any specific inputs - we can just handwave them all and say that no matter what, it must return a valuable output, which is what we really care about. If we revisit the expression (fn x => 15150) (f y) with the totality of f in hand, now reasoning about it is very simple. We know that y is a value, and that f is total, so by definition f y is valuable. This means that, regardless of what value f y actually evaluates to, we can step into the body of (fn x => 15150), and thus conclude that the expression reduces to 15150.

More generally, suppose we have the following definition:

fun f (x : int) : int = e

where e denotes some expression that we will leave unspecified for the moment. Thus, f is a function that takes in some input of type int, and produces a val binding to bind it to the identifier x, and then evaluates the expression of e in the scope of that binding. Note that such a binding is truly a val binding - since we have eager evaluation, the input must first be evaluated to a value, and then bound to the identifier x.

Then, consider the expression f (g y), where g is some other function and y is some value. Oftentimes, we would like to just step through the definition of f in some proof, and say that f (g y) reduces to the expression e in the scope of the binding [(g y)/x] (recall that this is our notation for "binding the value of g y to the identifier x). This is not always possible in general, however. Recall that a well-typed expression either reduces to a value, loops forever, or raises an exception. If g y were to loop forever, would we be able to enter the function body of f, and evaluate e?

Of course we would not - this is just another consequence of eager evaluation. Evaluation of f (g y) would get "stopped at the door", so to speak. We would not be able to enter f because g y does not reduce to a value. Since the input to g is arbitrary, in this case, to be able to claim that f (g y) enters the function body of f requires that we know the totality of g. We thus use totality as a tool to get at what is really important - the valuability of the arguments to f, which in this case is g y. As such, while we may refer to such citations as "totality citations", and name this idea of totality, do not forget that this is all just a consequence of eager evaluation. We are really looking for valuability of arguments.

Pattern Matching

By Kaz Zhou, September 2020. Revised May 2021

Patterns take on many appearances, such as:

  • Constants: 150
  • Variables: x
  • Wildcard: _
  • Tuples: (true, _)
  • Constructors (which may contain other patterns):
    • Lists: x::xs
    • Other datatypes: Node(L,x,R)

Patterns can be matched against values to form bindings. Consider the following declaration:

val (x,y) = (1,2)

The result is that 1 gets bound to x, and 2 gets bound to y.

Pattern matching may fail. For example, the following raises exception Bind.

val 10 = 9

Besides val declarations, pattern matching is also used in function declarations, lambda expressions, and case expressions.

Function declarations:

fun fact 0 = 1
  | fact n = n * fact(n-1)

The function fact is given an int as input. If the input successfully matches the pattern 0, then the function returns 1. Otherwise, the input is matched with the pattern n, binding the input to n. For example, if we evaluate fact 5, then 5 is bound to n, so the expression becomes 5 * fact(4).

Each clause of the function declaration tells fact what it should do, depending on the input. The bar, |, acts as a separator between the two clauses.

Note that it's important for your patterns to be exhaustive. The above function is fine, because all values of type int can match with either 0 or n. However, suppose we had the following function:

fun fiction 1 = 1
  | fiction 2 = 2
  | fiction 3 = 6

There are many inputs which do not match with either 1, 2, or 3. For example, fiction 4 would raise exception Match.

A more subtle bug is when patterns are redundant. The following example has the clauses of fact swapped.

fun factloop n = n * factloop(n-1)
  | factloop 0 = 1

The second clause of factloop never gets executed! When evaluating factloop 0, SML will try to match 0 to each pattern, in order. Therefore, factloop 0 steps to 0 * factloop(-1), because 0 can match to n. Convince yourself that factloop k will loop forever for any k of type int!

Lambda expressions:

(fn [] => false | x::xs => true) [1,2,3]

The lambda expression is similar to a function, as it turns an input into an output. In the example above, [1,2,3] is the input. It doesn't match with [], but it does match with x::xs. Namely, 1 gets bound to x, and the list [2,3] gets bound to xs. As a result of this successful pattern matching, the lambda expression returns true.

You should still make sure your patterns are exhaustive. For example, the following expression raises exception Match:

(fn [] => false) [1,2,3]

Case expressions:

fun fact' x =
    case x of
        0 => 1
      | n => n * fact' (n-1)

First, note that fact' does the same thing as fact. However, it uses an extra case expression.

Let's consider what happens when we evaluate fact' 5. First, 5 gets bound to x. Then, the case expression tries to match 5 to a pattern. In this scenario, 5 successfully pattern matches with n, so 5 gets bound to n. Therefore, fact' 5 evaluates to 5 * fact' 4.

As usual, the patterns in case expressions should be exhaustive.

Recursion and Induction

Inductive Proofs

Recursion

Tail Recursion

By Eunice Chen and Brandon Wu, December 2020

In programs, functions often make calls to either themselves (recursive calls) or other functions. There are two types of function calls: non-tail calls, and tail calls. A function call is called a tail call if the caller does not modify or examine the result of the function call.

If every recursive call made by a function is a tail call, that function is called tail recursive. Put another way, a function is tail recursive if the last operation performed by the function is the recursive call.

Suppose we have the following function to sum the elements of a list:

fun sum [] = 0
  | sum (x::xs) = x + (sum xs)

This function is not tail-recursive, because after the recursive call is evaluated, we add the result to x. Similarly, if we were to case on sum xs or did any other operation on its result, then this function would not be tail-recursive.

In order to write the sum function in a tail-recursive manner, we know that the last computation that we can do is the recursive call to sum. But, we also need some way to keep track of the sum of the list elements and add to it, since we no longer can add things after the recursive call. We will make an accumulator variable, acc, that will keep track of the sum of list elements we have exposed so far, and pass that down through the recursive calls. In our base case, we know we have seen every element in the list and there are no more elements to add, so our base case looks like this:

fun tsum ([], acc) = acc

In our recursive case, we want to use our accumulator to account for the top element of our list, then pass down that accumulator to the recursive calls. We can do that as follows:

fun tsum (x::xs, acc) = tsum (xs, x + acc)

Because SML evaluates the function arguments before evaluating the function call, x + acc is performed before sum is called, and then this updated accumulator value is passed down to the recursive sum call on xs. Thus, the last operation performed is the recursive call, making this a tail-recursive function.

Because we now have the accumulator variable, we must pass in 0 as the accumulator for tsum to behave as expected. In addition, because we have changed the type of sum, we can rewrite the original sum function by using the tail-recursive version as a helper.

Putting these parts together, we have:

fun tsum ([], acc) = acc
  | tsum (x::xs, acc) = tsum (xs, x + acc)

fun sum L = tsum (L, 0)

Why do we care about tail recursion? One reason is that the tail-recursive version of functions uses less space on the call stack. (The call stack is what keeps track of function calls- in this case, the call stack keeps track of the recursive calls and the work left to do after the recursive calls.)

Consider the following stack trace of the sum function, which is not tail-recursive:

sum [3, 2, 1]
=> 3 + (sum [2, 1])
=> 3 + (2 + (sum [1]))
=> 3 + (2 + (1 + (sum [])))
=> 3 + (2 + (1 + (0)))
=> 3 + (2 + (1))
=> 3 + (3)
=> 6

In this stack trace, we can see that the sum function takes linear space. (By stack space, we mean the space around the recursive call). Assuming an input list of length n, the stack will have n additions at its largest, giving us a stack of size n.

Now, consider a stack trace of the tail-recursive sum' function:

sum ([3, 2, 1], 0)
=> sum ([2, 1], 3)
=> sum ([1], 5)
=> sum ([], 6)
=> 6

Notice that the stack trace does not get any wider: we do not need any memory space to store "what is left to do," so this takes constant space.

Example

Let's try to turn the function to calculate the nth Fibonacci number into a tail-recursive function.

fun fib 0 = 1
  | fib 1 = 1
  | fib n = fib (n-1) + fib (n-2)

Note that there are two recursive calls that we add together. In order to be tail-recursive, we can only make one recursive call (if there are two recursive calls, then one must be evaluated before the other, making the first recursive call not a tail call).

What if, instead of computing the nth Fibonacci number, we calculate the nth and the (n-1)th Fibonacci number together? (When n is 0, we can just define the (n-1)th Fibonacci number to be 0).

fun fib 0 = (1, 0)
  | fib 1 = (1, 1)
  | fib n =
    let
      val (a, b) = fib (n-1)
    in
      (a + b, a)
    end

This is closer, but we still are doing computation after the recursive call: we add the results of the recursive call to each other, then return. Let's try to use the accumulator idea we had earlier in the sum function, but this time, since we calculate the nth and (n-1)th Fibonacci number, we will pass in two accumulators. Accumulator a will hold the nth Fibonacci number, and accumulator b will hold the (n-1)th Fibonacci number.

fun fib' (0, a, b) = a
  | fib' (n, a, b) = fib' (n-1, a + b, b)

And if we call fib' (n, 1, 0), observe that we will indeed get the correct result.

Further Practice

For even further practice, try to write a tail-recursive function of the list-reversing function on your own.

fun rev [] = []
  | rev (x::xs) = (rev xs) @ [x]

Answers

If we want to do this tail-recursively, we add an accumulator variable and proceed as usual. If we want to use the same types as the original rev function, we can call our tail-recursive version, as we did in our sum example.

fun trev ([], acc) = acc
  | trev (x::xs, acc) = trev (xs, x::acc)

val rev = fn L => trev (L, [])

Asymptotic Analysis

By Brandon Wu, May 2020

We have now dedicated a significant amount of discussion towards how to reason about the correctness of programs. Correctness, however, is not the end-all-be-all. In many cases, we would like to be able to think about the complexity of our programs - that is, the amount of resources that it takes to run it on certain inputs. Such resource concerns may include time, space, and energy usage. For the purposes of this book, our principal resource of interest will be time.

The Reality

It is, however, hopefully clear that this question is rather ill-founded. For one thing, hardware limitations mean that running the same program on different machines may yield differing results, based on the performance ability of individual machines. Indeed, even running the same program on the same machine may yield different results, based on the the computer's current workload. We want a metric that is somehow agnostic to these implementation details, that can give us an idea of how efficient an algorithm is.

Additionally, we are usually not just interested in a program's runtime based on a single input, but its behavior across a wide range of inputs. Additionally, the possibility of infinitely many inputs makes empirical methods like taking an average rather infeasible. Generally, programs also tend to have "worse" inputs than others. It is not necessarily fair to compare the time it takes to compute the millionth Fibbonaci number with the time it takes to compute the second. We will have to do better.

The Solution

We will generally turn to asymptotic analysis to solve these issues. It provides a nice mathematical definition that conveniently takes care of many of the points previously mentioned.

[Big-O] We say that a function \( f : \mathbb{R}^+ \rightarrow \mathbb{R}^+ \) is in big-O of another function \( g : \mathbb{R}^+ \rightarrow \mathbb{R}^+ \) (write \( f(n) = O(g(n) \) or \( f(n) \in O(g(n)) \)) if there exist constants \( c, n_0 > 0 \) such that for all \( n \geq n_0 \), \( f(n) \leq cg(n) \). In words, \( f(n) = O(g(n)) \) if there exists a point beyond which \( f(n) \) can be upper bounded by \( g(n) \), amplified by some constant factor.

In intuitive terms, we can think of a function \( f \) as being in big-O of another function \( g \) if it is "less than or equal to" that function, in terms of the complexity class that it belongs to. For instance, \( 2x \) is \( O(x^2) \), and also \( O(x) \), the former being because a quadratic function grows faster than a linear function by a factor of \( x \), and the latter being because we effectively do not care about constant factors. Note that for that example, we can choose \( c = 2 \), which makes \( f(n) = cg(n) = 2x \), which clearly makes \( f(n) \leq cg(n) \) true.

Asymptotic analysis allows us a convenient notion of what the runtime of a function really is in terms of the size of the input. We will usually define what metric this takes, but common measures include the length of a list, the number of nodes in a tree, or something similar. If we let \( T(n) \) denote the function that maps input sizes to worst-case "runtimes" (that is, \( T(x) \) is the maximum number of steps it takes to run on an input of size \( x \)), then we are usually interested in upper bounding \( T(n) \) - that is, determining what complexity class it falls into. Note, however, that we are simply finding upper bounds - the idea is that this \( T(n) \) function cannot be determined exactly, but its approximate asymptotic behavior can be upper bounded by a different, more defined function. We also care about achieving a tight upper bound - one that is not unnecessarily large. For instance, we could say that many functions are in \( O(2^{2^n}) \) - but this is not particularly useful information. You must be careful to perform analyses without being too liberal.

NOTE: By "number of steps", we usually mean some idealized notion of some "step of computation" that an algorithm takes, such as the number of comparisons that it takes to run a quicksort algorithm, or the number of times that we cons an element on or off a list. This lets us abstract away from how long it actually takes a computation to run, ignoring the physical machines used. We only care about the high-level "steps" that an algorithm takes, which is the same regardless of platform.

In a world with an incredible amount of data being processed and transmitted in our daily lives, asymptotic analysis forms a nice metric for the efficiency of algorithms. Most of the specific content that has been discussed so far is beyond the scope of this book, but it is good to have an intuitive understanding of asymptotic analysis nonetheless.

Asymptotic Analysis at a Glance

Oftentimes, one will have to "eyeball" the complexity of their function or program. This really just amounts to knowing what operations that it executes, and how many times they are executed.

For instance, consider the following function:

fun foo (0 : int) : int = 0
  | foo (n : int) : int = n + foo (n - 1)

Clearly, this function simply computes the sum of the first \( n \) nonzero numbers upon being given \( n \) as an input. What it does is not important, but if we were to try and quantify the complexity of foo, we might say that it is \( O(n) \) in \( n \), the value of the number given as input. This is because we can consider arithmetic operations to be constant-time (that is, running in \( O(1) \)), and we know that the function should recurse \( n \) times.

But now let us consider how long it might take to run the following code fragment:

fun bar (0 : int) : int = 0
  | bar (n : int) : int = (foo n) + bar (n - 1)

Now, instead of adding n, each computation in the recursive step instead adds foo i, invoking the previous function.

This becomes slightly harder to eyeball. We can eyeball this as upper boundable by \( O(n^2 \), though we would desire some more justification than just what it "seems to be." We will need to turn to more sophisticated methods to analyze this more formally, which we will cover in the next chapter. The general idea of estimating complexity, however, is simply to look at programs in terms of their components - how many times instructions run, and what the cost of each instruction's individual cost is. This becomes a very powerful method of reasoning that we will explore more later when we discuss sequences, though we will introduce a way to do so in a slightly more rigorous manner.

Work and Span

By Aditi Gupta and Brandon Wu, May 2020. Revised September 2020

We will now turn towards a more robust notion of work and span that let us analyze our conception of asymptotic runtime more effectively. It is still dependent on asymptotic analysis, but merely involves being more involved with how we go about generating the asymptotic bound for a function from the code itself. Additionally, we will not only analyze the approximate number of steps of the program (which corresponds to the runtime of the program, given sequential execution), but also the approximate longest chain of dependencies that exists in the program, assuming that computations can be run in parallel. We will elaborate more on this idea in this chapter.

Parallel Computing

It is intuitive to view things occurring sequentially. Whether it is reading a tutorial, writing a list of instructions, or making a plan for the future, sequential actions are very easy to think about. Even programs are written in a stepwise manner, with computations happening one after the other in a prescribed way. It seems to be in our nature to impose some kind of order on a list of actions.

Despite that, however, sequential evaluation is not always the most efficient. Sequential evaluation introduces dependencies where other subtasks cannot be started until we have finished the current subtask, which has the effect of potentially inducing wait times where none exist. For instance, if your plan is to do the laundry and your homework, it might not be the most time-efficient to wait until the washer is done before get started on your work. There is no dependency between laundry and homework - there is no logical reason why you should have to wait for one before the other, so you could do them both at the same time, or in parallel.

Parallel computing is a principle that is becoming more and more important as time goes on. Computers now more frequently have multiple cores in their processors, which means that tasks can be subdivided and assigned out to independent acting agents.

The benefits of doing so are clear. Suppose that we are stacking a shelf with merchandise. If the shelf is tall, this may take us a while - roughly linear in the height of the shelf, we can imagine (supposing we have an infinite stock of items, and that climbing a ladder somehow isn't a factor). If we had a person to dedicate to each shelf, however, then we could stock the shelves in "constant" time - independent of the number of shelves that there actually are. This will be a driving idea behind how we look at parallelism.

While we will not delve into the implementation details of parallel computing (which is more apt for a systems perspective), we will perform basic analysis of asymptotic complexity based on that premise. These take the form of work and span, fundamental ideas that will drive the next section.

Theory

First, let us consider what we will term a task dependency graph. This is not so important of a concept to memorize, but it will help in conceptualizing work and span. A task dependency graph is a directed acyclic graph (that is, a graph whose edges are one-way, and there exist no loops) that represents the dependencies when trying to perform a set of tasks. Each node is a task, labelled with the time that it takes to execute it (which is a singular unit, unable to be reduced otherwise), as well as edges that represent the dependencies in the graph. Any task cannot be started until all of the tasks that have edges directed towards it are finished - that is, all of a task's inbound edges denote its prerequisites.

With this knowledge, we will be able to define what we mean by work and span.

[Work] The work of a computation denotes the number of steps it takes to run, assuming access to only a single processor. Work thus represents the worst-case sequential evaluation time, and can be upper bounded with asymptotic analysis.

[Span] The span of a computation denotes the number of steps that it takes to run, assuming access to infinitely many processors that allow us to run tasks in parallel. Span thus represents the worst-case parallel evaluation time, and can be upper bounded with asymptotic analysis.

What we find is that work directly corresponds to our previous intuition of the complexity of a computation, since our previous analyses have always been sequential. Span, however, is not quite as easy to eyeball. Now, we are really looking for the longest chain of dependencies, that is, the longest sequence of tasks that must be run sequentially, since everything else can be run concurrently. Infinitely many processors, while obviously unrealistic, helps to simplify our analysis, and provides us a "target" for the efficiency of a fully parallel algorithm.

We illustrate these concepts with the following graph.

Process Graph
Fig 1. Task dependency graph illustrating dependencies between tasks and task durations

So in this example, the work of our graph would be \( 1+3+6+2+5+9+3+3+10 = 42 \), since with a single processor, the dependencies don't really matter to us. We have no choice but to complete every task, and the precise order doesn't matter. That isn't to say that we can execute the tasks in any order, that plainly isn't true - we simply mean that there is no order that can change what our runtime is.

On the other hand, for span we must consider the length of the longest path. The span of this graph would thus be \( 1 + 3 + 6 + 9 + 10 = 29 \), since that is the longest path. Even being able to execute everything else in a parallel way, we cannot avoid the fact that these nodes must follow one after the other. This path is thus the limiting factor in our runtime - ultimately it constrains the amount of time that we expend.

Task dependency graphs are a concept that we discuss purely for theoretically being able to understand the idea of work and span. We will look at examples in terms of actual SML code in the next section, which will be primarily where we do our work/span analysis.

Work/Span Analysis of Code

The previous example was rather contrived. For one thing, it is prespecified - we already knew all of the tasks that there were, along with its dependencies and task times. As such, we could compute a simple, numerical answer. This will likely not be the case. We are interested in work/span analysis of algorithms, which will yield us another function - one describing the runtime complexity of the algorithm as a function of some notion of input size.

For recursive functions, work/span analysis is very easy to do. We characterize it in terms of recurrence relations, which are themselves recursive functions describing the work or span of some code. Then, we simply solve for the closed form of the recurrence relation and estimate a Big-O bound to arrive at our desired complexity.

Consider the following example:

fun length ([] : int list) : int = 0
  | length (x::xs : int list) : int = 1 + length xs

The first step to determining the work and span of such a function is to write a recurrence relation. These steps are explicit - the code should determine the recurrence relation, and the recurrence relation should determine the Big-O bound. We will first analyze this function's work complexity, then move on to span.

First, we should fix some notion of input size. This will differ based on what our recurrence is recursing on, but in this example it seems to be the size of the input list. Note that this follows directly from the code - if this were the factorial function, we may say that the recurrence is in terms of the value of the input, and as we will later see, if the input was a tree, we may write the recurrence in terms of the number of nodes in the tree.

So we can write the following recurrence for work. We will explain soon what exactly it means, and how to arrive at it:

\( W_{length}(n) = c_0 + W_{length}(n-1) \)
\( W_{length}(0) = c_1 \)

This recurrence is made of two parts - the recursive case and the base case. The first equation for \( W_{length}(n) \) simply denotes what the work for an input size of \( n \) should be - defined recursively. The second equation for \( W_{length}(0) \) defines what the work for an input size of \( 0 \) should be. This directly corresponds to our code, which has two clauses for a list of length \( 0 \) (that being []), and for the general case. This is an important observation to make, that the recurrence follows directly from the code.

The recursive case says that, for an input of size \( n \), the work done is \( c_0 + W_{length}(n-1) \). Here, \( c_0 \) denotes some constant. This is supposed to correspond to the recursive case of the function, and if we look at it, we have a recursive call length xs, as well as some other work of adding one. Adding one, being an arithmetic operation, is a constant-time process, meaning that it takes a non-zero constant amount of time. This is what \( c_0 \) is supposed to represent - the constant amount of non-recursive work that must be done, after the recursive call has finished. It is not important what \( c_0 \) is, just that it is some unspecified amount of work that is not a function of \( n \).

Conversely, \( W_{length}(n-1) \) represents exactly the amount of work done by the recursive call, since it is literally defined to be the amount of work done by an input of size \( n-1 \), which is exactly what happens when we call length xs, where xs has length \( n-1 \).

NOTE: Even if we did not have the addition operation, we would still have \( c_0 \). This is because merely entering the function and figuring out which case to execute takes some non-zero amount of work - it is impossible to run the recursive call perfectly with no other time expense. As such, we would see exactly the same recurrence even if the recursive case was length (x::xs : int list) : int = length xs (which would also be a very bad length function).

For the base case, we have that \( W_{length}(0) = c_1 \), since in the base case we just return 0. This has a constant amount of work associated with it, as argued previously, so we use the constant \( c_1 \) to denote that, since the amount of work is likely not the same constant as that in the recursive case, when adding 1.

So this is how we arrive at the work recurrence for length. We will now turn to the span recurrence, which we obtain as:

\( S_{length}(n) = c_0 + S_{length}(n-1) \)
\( S_{length}(0) = c_1 \)

Note that the span recurrence is exactly the same as the work recurrence. This should make sense, because there is no opportunity for parallelism in the length function - we can only pop off elements one by one from the list. In the recursive case, we must wait for the result of the recursive call on xs, which means we unavoidably must expend the span of \( S_{length}(n-1) \) - additionally, we have a data dependency. We cannot execute the addition in 1 + length xs until we obtain the result for length xs, which means that we must sum the time it takes to compute length xs (that being \( S_{length}(n-1) \)) and the time it takes to carry out the addition operation (that being \( c_1 \)).

Now we will begin the task of actually solving the recurrence. They are the same recurrence, so without loss of generality we will solve just the work recurrence.

We know that it has the form of \( W_{length}(n) = c_0 + W_{length}(n-1) \), and eventually reaches a base case at \( W_{length}(0) = c_1 \). We can "unroll" the recurrence a few times to see if we can see a pattern, and then arrive at our answer.

So we start out with \( W_{length}(n) = c_0 + W_{length}(n-1) \), but if we invoke the definition of \( W_{length}(n-1) \), we can produce \( c_0 + c_0 + W_{length}(n-2) \), since \( W_{length}(n-1) = c_0 + W_{length}(n-2) \). By doing the same for \( W_{length}(n-2) \), we get \( c_0 + c_0 + c_0 + W_{length}(n-3) \). It seems we've hit upon a pattern - each time we "unroll" the definition of \( W_{length}(n) \), for progressively lower \( n \), we get another \( c_0 \) term back out. Then, we know that the recurrence should eventually solve to:

\( W_{length}(n) = (\sum_{i=1}^n c_0) + c_1 \)

We will usually omit the \( c_1 \), since it does not matter asymptotically. Then, clearly this is equivalent to \( nc_0 + c_1 \). We see that this closed-form solution is linear in \( n \) - so then we have that the work and span of this function is in \( O(n) \), which is consistent with what we would expect if we had "eyeballed" it.

Work/Span Analysis: Trees

First, we will discuss the definition of a binary tree in SML:

datatype tree = Empty 
              | Node of tree * int * tree

This denotes that a tree is either the constant constructor Empty denoting the empty tree, or a Node that contains an integer value, as well as two tree children, that can themselves be Nodes or Empty.

Tree
Fig 2. Sample binary tree

So for instance, we may represent the above tree with Node(Node(Node(Empty, 4, Empty), 3, Empty), 1, Node(Empty, 2, Empty)). Put more fancily:

Node(
    Node(
        Node(
            Empty,
            4,
            Empty
        ),
        3,
        Empty
    ),
    1,
    Node(
        Empty,
        2,
        Empty
    )
)

Now we will analyze the complexity of finding the size of a tree. Consider the following implementation for doing so:

fun size (Empty : tree) : int = 0
  | size (Node (L,x,R) : tree) : int = size L + 1 + size R

First convince yourself that it actually works. It simply recursively finds the size of the left and right tree, then adds one for the node that it is currently at. In the empty case, we consider the empty tree to have a size of 0.

The major difference between this function and the previous length function was that length had one recursive call - size has two. We will need to reflect this change when we write our recurrences. Additionally, we need a new variable for our recurrence - we no longer have a list whose length we can induct on. A similar analogue will be \( n \), the number of nodes in the tree, so we will take that as our recurrence variable. We will focus first on work.

We will obtain the following work recurrence:

\( W_{size}(n) = c_0 + W_{size}(n_l) + W_{size}(n_r) \)
\( W_{size}(0) = c_1 \)

where we define the number of nodes in the tree \( n = 1 + n_l + n_r \), and \( n_l \) and \( n_r \) denote the number of nodes in the left and right subtree, respectively. This follows similarly to our recurrence for length in the previous part, where c_0 is just some constant amount of work that we necessarily have to do, and the two \( W_{size} \) calls are from the two recursive calls we make to L and R.

Now, we don't know precisely how big \( n_l \) and \( n_r \) are, with respect to \( n \). This makes our analysis a little more tricky, but essentially all we need to do is think of the worst case, as we are interested in the worst-case asymptotic complexity of this function. For work, however, there is no worst-case - no matter how the tree is structured, we must visit every node once, doing a constant amount of work each time. So we should obtain, in the end, \( W_{size}(n) = nc_0 + c_1 \), which we know is \( O(n) \). So in this case, we didn't have to think about the structure of the tree. In the next section, it will matter.

Work/Span Analysis: Balanced vs Unbalanced Trees

We will revisit the same example, except from the perspective of span.

The important point to note is that, now, we have two separate recursive calls that are happening in the recursive call of size. These recursive calls have no data dependency - neither running depends on the other. This means that they can be run in parallel, which means that the total span that we compute should just be the max over both. This is because we can imagine that both of them lead to different "paths" in our task-dependency graph - we are only interested in the maximum-length path. So we will run both results, and whichever one takes longer to return an answer to us is the "limiting reagent" of our computation.

So we will write the span recurrence as follows:

\( S_{size}(n) = c_0 + max(S_{size}(n_l), S_{size}(n_r)) \)
\( S_{size}(0) = c_1 \)

Now note that we are taking the max over the two recursive calls. Now, we cannot handwave the structure of the tree like we did in the previous part - if one path is significantly longer than the other, then it will stall the computation for longer. We still must visit every node, but some of them can occur in parallel.

We will consider the first case - if we have an unbalanced tree. Suppose that the tree is heavily unbalanced - akin to a (diagonal) linked list. Without loss of generality, let it be "pointing" to the left. Then, \( n_l = n - 1 \), and \( n_r = 0 \). Then, the max over both recursive calls should clearly be that of \( S_{size}(n-1) \), since it has to compute the size of a larger tree.

So we can update our recurrence and obtain:

\( S_{size}(n) = c_0 + S_{size}(n-1) \)
\( S_{size}(0) = c_1 \)

This recurrence is exactly the same as that of length, so we know that we will get that \( S(n) \in O(n) \). This should make sense intuitively, since the depth of the tree is \( n \), and there are dependencies between each level - we cannot go to the next level until we are done with the current one. So we cannot avoid having to visit every level sequentially, which results in \( O(n) \) span.

Now, what if we consider a balanced tree? Well, the balanced case would be if the number of nodes in the left and right subtrees are roughly equal - that is, \( n_l = n_r = \frac{n}{2} \). We will consider them exactly equal to simplify our analysis, but we will obtain the same asymptotic answer. Then, we know that the maximum is just any one of them, since they will have the same span.

So we can update our recurrence and obtain:

\( S_{size}(n) = c_0 + S_{size}(\frac{n}{2}) \)
\( S_{size}(0) = c_1 \)

This is slightly different than our length recurrence. We will try unrolling to make sense of this recurrence.

We have that \( S_{size}(n) = c_0 + S_{size}(\frac{n}{2}) \). Plugging in the recursive definition of \( S_{size}(\frac{n}{2}) \), we get that this expands to \( c_0 + c_0 + S_{size}(\frac{n}{4}) \), which by the same trick expands to \( c_0+ c_0 + c_0 + S_{size}(\frac{n}{8}) \), and so on and so forth. We note that we are dividing the number of nodes by 2 each time - and we know that we can divide \( n \) by two roughly \( \log_2(n) \) times. So in total, we can solve the summation of \( S_{size}(n) \) as \( S_{size} = (\sum_{i=1}^{\log_2(n)} c_0) + c_1 \).

So then this simplifies to \( S_{size}(n) = \log_2(n)c_0 + c_1 \). This is a logarithmic function of \( n \), so we get that the span of size is in \( O(\log n) \). Thus, we obtain a different span for balanced trees versus unbalanced trees - balanced trees are more efficient and parallelism-friendly.

Work/Span Analysis: Size-dependent Operations

In these past two examples, we have only seen examples that did a constant amount of non-recursive work. We will now analyze a function that does non-recursive work that is a function of the input size \( n \). This will result in different kinds of recurrences. First, however, we will digress briefly to motivate the example that we will analyze.

[Case Study: Tree Traversal]

When analyzing trees, it is often prudent to utilize tree traversal, or a systematic way of enumerating the elements in a tree. There are multiple different ways to do this, depending on what your intentions are - a few namely being preorder, inorder, and postorder traversal.

With these different methods of traversal, we can turn a tree into a different kind of ordered data structure, such as a list or sequence. This can come in handy when we desire future fast access to any arbitrarily ranked node in the tree, or if we want to convert it for purposes of printing, for instance.

Each traversal is characterized by a certain "strategy" of traversal, depending on how it ranks the three possible directions that it can go - root, left, and right. Inorder traversal, for instance, is characterized by left-root-right prioritization - this means that it goes left first, and if it can't go left, then it visits the root node, and otherwise it goes right. Note that this does not mean that it visits the root of the left subtree first - it simply reruns the same process on the entire left subtree. No matter what the traversal strategy is, a node is never actually visited until the "root" action is taken. Preorder traversal is root-left-right, and postorder is left-right-root. Examples of preorder and inorder traversals (the most common you will see in this book) are below.

traversal
Fig 3. An example of a preorder traversal (left) and inorder traversal (right) of a binary tree, with visited nodes labeled in ascending order

Tree traversals can also come in handy when generating different notations for mathematical expressions when represented in the form of an binary expression tree, which has nodes that consist of either a numeric constant, which has no children, a unary operation with a single child, or a binary operation with two children. For instance, a binary expression tree for the mathematical expression \( (4-1) * 2 \) is shown below.

optree
Fig 4. A binary expression tree for the expression \( (4-1) * 2 \)

With inorder traversal of this expression tree, we can generate the constants and symbols in exactly the same order as \( (4-1) * 2 \), which is how we would normally interpret it. Preorder and postorder traversal, however, result in an interesting interpretation - what is known as prefix (or Polish) and postfix (or Reverse Polish) notation.

In prefix notation, by using preorder traversal, we obtain the expression \( * - 4 1 2 \), which is how we would interpret the same expression if all of our operators appeared before their operands. Similarly, with postorder traversal, we obtain the expression \( 4 1 - 2 * \) in postfix notation. Prefix and postfix notation have significance in their lack of ambiguity - while infix notation is easy for humans to read, it requires parentheses sometimes to denote how operator precedence takes place. Prefix and postfix notation have no such flaw - they are unambiguous in how operations take place. In programming language interpreters, such notations are sometimes used to represent mathematical expressions.

Such a digression serves as motivation for the next function that we will analyze - which is writing the preorder traversal of a tree in SML. The code looks like this:

fun preord (Empty : tree) : int list = []
  | preord (Node(L, x, R) : tree) : int list = x :: (preord L @ preord R)

We can readily see that this follows the root-left-right order that we specified earlier for preorder traversal. Recall that @ is the function for list concatenation, and has a complexity of \( O(n_l) \) in \( n_l \), the size of the left input list. Thus, as stated before, this function has a non-constant amount of work at each recursive call - we must evaluate @ of the result of preord L and preord R, which is a function of \( n \), the number of nodes in the tree.

We will analyze only the balanced case for this function. We invite the reader to think about the unbalanced case on their own.

For the recursive case, we know that \( W_{preord}(n) \) will take the form of \( c_0 + W_@(n_l) + W_{preord}(n_l) + W_{preord}(n_r) \). By our balanced assumption, we know \( n_l = n_r = \frac{n}{2} \), so we can write our work recurrence as:

\( W_{preord}(n) = c_0 + \frac{n}{2}c_1 + 2W_{preord}(\frac{n}{2}) \)
\( W_{preord}(0) = c_2 \)

Note that the term \( W_@(n_l) \) is a recurrence in terms of \( n \), the size of the left list given as input to @. Since we know that the work complexity of @ is \( O(n) \), we can replace \( W_@(\frac{n}{2}) \) with \( \frac{n}{2}c_1 \), which is simply some constant \( c_1 \), scaled by a linear factor of the input \( \frac{n}{2} \). This is how we will generally deal with analyzing the complexity of functions that make use of helper functions.

We will make use of a new method to solve this recurrence - the Tree Method.

[Tree Method] The tree method is a method for solving recurrences of certain recurrences that sum to the same quantity across levels, and usually have multiple recursive calls. In essence, if each level has the same amount of computation, then the recurrence solves to the (number of levels) * (amount at each level).

The below diagram illustrates the Tree Method.

Tree method
Fig 5. An illustration of the Tree Method for the recurrence of preord.

We will now explore exactly how we arrived at this conclusion.

First, note that this tree exists as a result of the recurrence. We used the code to specify the recurrence, and then the recurrence itself described this tree. It has a branching factor of 2 (that is, two children of each node that are non-leaves) since the recursive case of the recurrence has two recursive calls, and at each level the size of the input changes. Since the recursive calls are called on inputs of size \( \frac{n}{2} \), each level results in a division by two of the input size.

Additionally, we know that the amount of work at each node (of input size \( n \)) is necessarily \( c_1 \frac{n}{2} \). There is technically also a \( c_0 \) term, but we will omit it since it is asymptotically dominated by \( c_1 \frac{n}{2} \). The precise non-recursive work done by each "node" is specified slightly down and to the left of each node. Individually, they don't look very nice to sum over - at level \( i \), it appears the work at each node is \( c_1 \frac{n}{2^{i+1}} \). However, level \( i \) also has \( 2^i \) nodes, by the branching nature of the recurrence tree. As such, the total amount of work done at level \( i \) is just \( c_1 \frac{n}{2^{i+1}} * 2^i = c_1 \frac{n}{2} \), which is not a function of the level \( i \).

As such, each level has the same amount of work - which is very convenient, as we can now just take that quantity and multiply it by the number of levels. So in total, when we solve out the recurrence, we should obtain that \( W(n) = (\sum_{i=1}^{\log_2(n)} c_1\frac{n}{2}) + c_2n \), since the \( c_2n \) term is separately obtained from the base level, due to the \( n \) leaves that each have \( c_2 \) work.

The term \( \sum_{i=1}^{\log_2(n)} c_1\frac{n}{2} \) thus goes to \( \frac{c_1}{2} n\log_2(n) \), so in total we obtain that \( W(n) = \frac{c_1}{2} n\log_2(n) + c_2n \), which is in \( O(n \log n) \). So we're done.

The tree method is really nothing more than just a visual way to view the recurrence - it is possible to achieve the same effect by just writing a summation. It is sometimes more intuitive to try and visualize, however, and for recurrences where the levels sum to the same amount, the tree method is very effective. However, not all recurrences exhibit such behavior, and it's hard to know a priori whether a given recurrence is such a one. Nevertheless, it is a powerful method and sufficient for many purposes.

We omit the span analysis of this function for the reader.

Conclusions

Asymptotic analysis is a very important technique for attempting to categorize the efficiency of programs. Moreover, it is not enough to simply find the asymptotic sequential complexity of a function - parallel computation is becoming increasingly more important, and purely sequential analyses are not representative of real-world algorithms. Work and span analyzed through recurrence relations form a powerful framework for examining the complexity of recursive functions, which is robust enough to classify many kinds of algorithms.

Datatypes

Parametric Polymorphism

By Brandon Wu, September 2020

Type safety is a very powerful concept, one that lets us pin down the space of allowable inputs to only a narrow space of values. In this way, we can ensure that the function is only allowed to be applied to those arguments that we are interested in, making any other computations illegal. Sometimes, however, we are interested in making a function more general, with arguments that can somehow range over types. We will see how SML achieves this with parametric polymorphism, which is a separate concept than other forms of polymorphism that you may have seen before.

Motivation

Suppose that we wanted to write a length function for lists.

Well first, we might want to specify. What type of list? Let's say that we first have int lists.

Well, the implementation is easy:

fun lengthInt ([] : int list) : int = 0
  | lengthInt (x::xs : int list) : int = 1 + lengthInt xs

But suppose that we don't only want the length of int lists, but also the length of string and char lists. Well then, we'll need to define a few more functions:

fun lengthString ([] : string list) : int = 0
  | lengthString (x::xs : string list) : int = 1 + lengthString xs

fun lengthChar ([] : char list) : int = 0
  | lengthChar (x::xs : char list) : int = 1 + lengthChar xs

This becomes unnecessarily tedious. As we can see, other than the name of the function and the type annotations, these functions have exactly the same code. Nothing about the functions themselves make use of the fact that the lists in question contain ints, strings, or chars - they are merely manipulated as arbitrary elements. We would like to be able to write a length function that works on a list of any type.

The Idea

We've seen something similar to this - we know that lists of all kinds exist. No matter whether it is an int list, a string list, or a char list, we know that :: is always valid to use when consing an element onto a list of the appropriate type. Indeed, we say that :: has type t * t list -> t list for all types t. Cons is thus polymorphic, it can be used on many different types (though never two at the same time, for instance 1::["hello"] is still ill-typed).

This has been something of an alternative definition for what we will be discussing in this chapter - parametric polymorphism. Seen in this way, all polymorphic functions are parameterized by a type variable that ranges over types. We will go more in depth into this idea.

[Type variable] A type variable is a type that is quantified over types - that is, it can specifically take on the form of many types. They are enumerated as 'a, 'b, 'c, and so on, and they are referred to by Greek letters. For instance, 'a is "alpha", 'b is "beta", 'c is "gamma", and so on.

Type variables are themselves valid types. As such, we can rewrite length as:

fun length ([] : 'a list) : int = 0
  | length (x::xs : 'a list) : int = 1 + length xs

'a is quantified over all types, so no matter what type the input list to length is, it will work correctly, as SML will use type inference to determine what the proper type of the length function should be, in any given context. Note that 'a is a variable, and as always, in proofs, we must quantify our variables. 'a and other type variables thus implicitly correspond to having a "for-all" quantifier in front of it - so we say that an expression of type 'a has type t, for all types t. We further note that this is really syntactic sugar for the notation:

fun 'a length ([] : 'a list) : int = 0
  | length (x::xs : 'a list) : int = 1 + length xs

This specifies that the length function itself is parameterized over 'a. Although 'a is a "parameter" of length, it is not truly an argument - we do not explicitly pass in the type to length. In an abstract sense, however, we do, as each "instantiation" of length has been "supplied" an argument to 'a to produce a "copy" of the function. You can think of it as if length has infinitely many different variations that can be selected, depending on what the given type is inferred to be. Note that the "type" that is chosen does not necessarily need to be a concrete type - it may itself compose of type variables. For instance, consider the following code:

fun tupleLength [] = 0
  | tupleLength ((x, y)::xys) = 1 + length xys

Disregard the rather strange implementation, which is a rather arcane way of rewriting length with unnecessary overhead. In this example, we can see that the list xys contains tuples of two elements, the types of which are unknown to us. Without knowing a concrete type, we can only conclude that the type of length is ('a * 'b) list -> int. This is because the elements of the tuple do not necessarily have to be correlated with each other - 'a and 'b are thus independent, though they could be instantiated to the same type. As such, we have taken a type variable and replaced it with two more type variables, which gives us a little more information but still quite a bit of leeway. tupleLength is thus itself still polymorphic, with the same type of ('a * 'b) list -> int.

NOTE: We have now reached a point where we will begin to omit explicit type annotations, as they tend to unnecessarily constrain the types of functions. It is also a good exercise to be able to understand conceptually how type inference is carried out, which is covered more in the next section.

An important principle to note, however, is that an 'a type does not magically "just work" with regards to type safety. For instance, let us consider the following code fragment, which is ill-typed:

fun inc (x : 'a) : int = 1 + x

Note that, in this case, type annotations have actually worked against us! Had we removed the type annotations, this code would compile. The reason why this code is ill-typed is because, while we are allowed to use a type more generally than it actually is, we cannot use a polymorphic type more specifically. The reason for this is because if we think about it as if we explicitly passed in some type variable, in the scope of the function inc, inc has fixed 'a to be some type. It then attempts to add 1 to x, which is a value of that fixed, arbitrary type. However, we cannot add 1 to a value of any type - we can only safely add 1 to x if we know that x has type int. If 'a were instantiated to be a string, it would violate type safety to allow this code to compile. As such, we cannot explicitly type annotate x to be of type 'a, as it is used more specifically than that in the body of the function.

Other Forms of Polymorphism

A similar idea of extending functions to working on many types is exhibited by equality types, which are a behind-the-scenes process that you have already been exposed to. Consider the type of the function = - the equality operator. Clearly, it cannot have type 'a * 'a -> bool, since some types don't make sense to compare for equality. For instance, how would you compare whether or not a real is equal to another? Machine representations are finite, so asking the question is bound to introduce problems. Another difficulty is in function types - determining if two arbitrary functions are extensionally equivalent is an uncomputable problem (closely related to the Halting Problem). As such, we would like = to work on a wide spectrum of types, but also not work on another, also very wide spectrum of types.

To do this, SML has a concept of equality types. The type variable ''a (and ''b and ''c, as normal) are not quantified over all types, but merely all equality types. This includes int, bool, string, and any datatype built up from non-equality types, among others. Thus, the = operator actually has type ''a * ''a -> bool, so as to only work on compatible types.

NOTE: Note that the polymorphism demonstrated by length is of a different flavor than that of +, *, or other overloaded functions. The overloading of basic arithmetic functions to work on ints and reals is more in line with what might be referred to as "ad hoc" polymorphism, which is merely the compatibility of a single operator with several possibly heterogeneous implementations, with the precise implementation being chosen by context (such as type, in this case). It is thus important to note that while parametric polymorphism identifies a single, general implementation with many types, ad hoc polymorphism identifies several different implementations with different use cases. Ad hoc polymorphism is noticeably more inelegant than parametric polymorphism, but it is useful in the cases that you only have a small subset of types to extend an operator to.

Type Inference

During compilation, SML will often need to determine exactly what the type of the expression that we are looking at. This is not so different of a problem than type-checking, however. On a high level, SML simply assigns everything a very general type, and then begins looking at clues from the context so as to narrow down what the "most general type" is. By "most general type", we usually mean the most general type an expression can have, meaning that we do not use specific instances of polymorphic expressions, but just the polymorphic type itself. It is valid to say that the length function has type int list -> int, but its most general type is 'a list -> int.

[Type Inference] A procedure employed by type systems to infer the type of expressions and functions, even when not explicitly given those types.

This is how we can, for instance, determine that the function fn x => x + 1 must have type int -> int. x is not deliberately stated to have any type, but we know that it wouldn't make sense for x to be any other type than int

  • it would otherwise not type-check. Note that x cannot be real, since + cannot have an input type of real * int.

We might also think of it like this - consider the expression fn x => x + 1. Most generally, we know that its type should an instance of type 'a. Moreover, we see that it is a lambda expression, so it must now be an instance of 'a -> 'b. The input is x, so we assign x to be an instance of type 'a. We then attempt to apply the + operator to x, so for this to typecheck, x must have type int. The outcome of an int and an int with + should be an int, and that's the entire function body, so the whole expression has type int -> int.

Conclusions

Parametric polymorphism offers us a clean and elegant way to extend general implementations to work across a spectrum of types. In this way, we can still preserve type safety while allowing us to avoid writing out the same implementations for many different types. Type inference also conveniently lets us omit manually determining the type of our code, instead being able to determine it from context. Ultimately, parametric polymorphism is a simple idea that offers us a great deal of versatility.

Higher Order Functions

At this point, we've explored the concept of computation as evaluation, passing around values and reducing expressions to values as well. We have seen how we are allowed a great deal of versatility while maintaining type safety in SML's type system, and how we can construct arbitrary datatypes to be passed around as first class citizens, that is being able to be manipulated the same as any other value. We will now discuss what is considered one of the most powerful tools available in functional programming languages - that is, the exploitation of functions themselves as values, with which we can further parameterize our functions and results.

Currying and Staging

By Brandon Wu, June 2020

Suppose that we are interested in writing a function that adds two numbers. This is fairly simple - this is not a new concept to us.

fun add (x, y) = x + y

Then this function should clearly have type int * int -> int. We also know that this notation is really just syntactic sugar for the following:

val add = fn (x, y) => x + y

It binds the lambda expression that takes in a tuple of two integers and adds them together to the identifier of add. Yet with our knowledge of expressions and values, it is not too outlandish to write the following instead:

fun addCurry x = fn y => x + y

What might be the type of this function? Well, we know that addCurry takes in a value x, which should be an int, since it is summed with y. It then returns a lambda expression that takes in that same y, and returns the sum, which is an int. It seems to us that the type should be int -> (int -> int). This is an example of a curried function, and one of the first examples we will see of a higher-order function.

[Higher-Order Functions] A higher-order function is a function that takes in other functions as input, or returns a function as output.

[Currying] Named after mathematician/logician Haskell Brooks Curry, currying is the act of transforming a function that takes multiple arguments into a function that returns a function that takes the remaining arguments. Intuitively, it separates the arguments into a series of function applications, instead of all at once. We may refer to a general higher-order function that returns a function as a curried function.

An important note is that in the type int -> (int -> int), the parentheses are unnecessary. This is because type arrows are right-associative, in the same way that :: is. This means that there is already an implicit parentheses in the type int -> int -> int - the function simply takes in an integer and returns a function from int -> int. This is thus a separate type than (int -> int) -> int, which is a function that takes in a function from int -> int and returns an integer.

Note that in the curried example of addCurry we wrote above, this is really syntactic sugar for the following:

val addCurry = fn x => fn y => x + y

Not being ones to skimp on the syntactic sugar (which perhaps spells danger for our syntactic blood sugar levels), we can take this one step further. We can write the exact same declaration in a more concise way, using the form:

fun addCurry x y = x + y

This thus will curry our arguments for us, when we separate them by a space.

Note that our implementations of addCurry and add are not extensionally equivalent - for the simple reason that they do not even have the same type! They do, however, in a sense correspond, in that they seem to do the same thing - that is, add numbers together. The manner in which they do so is entirely disparate, however.

It is important to note that currying our functions gives us a notion of partial application, where we can give a curried function only some of its "arguments". This lends us to further specialization and modularity based on the use case and amount of information available. This is discussed more in the coming section.

Revisiting Extensional Equivalence

In previous chapters, we have explored the idea of extensional equivalence, and constructed a definition for extensional equivalence that covered two cases - the function case and the non-function case.

We seemed to agree on a meaning that said that, for non-function values, two values are extensionally equivalent if they are the same value, which is an perhaps an ill-justified definition that may leave a bad taste in one's mouth, but ultimately boils down to our intuitive notion that, yes, some values are just the same and we can't really do much more than question that. For instance, we can see that (1, 2, "hi") and (1, 2, "hi") are the "same", and neither are the same as (2, 1, "hello"). For functions, however, we decided on a slightly more appeasing definition that defined a function by its input-output behavior. We restate the definition below:

[Extensional Equivalence (Functions)] We say two expressions e : t1 -> t2 and e' : t1 -> t2 for some types t1 and t2 are extensionally equivalent if for all values x : t1, e x \( \cong \)e' x.

It is our hope that we are now in a place to properly appreciate this definition. For functions that have type int -> int, this is a fairly straightforward definition - these functions are extensionally equivalent if, for every integer that we give them, they return the same int. However, what about curried functions? Our definition, in light of this new concept, is that curried functions are extensionally equivalent if the functions that they return are extensionally equivalent.

No matter how deeply nested this currying takes place, this definition will suffice. Types must be finite, so we must eventually reach a point where we can say that values are "the same" (excluding the existence of non-equality types, perhaps). We can see now that this idea of extensional equivalence is elegantly compatible with the existence of curried functions, being a recursive definition much in the same way that the datatypes that we declare or the functions that we write are.

Staging

With curried functions, we can have much more deliberate control over when a function does evaluation, particularly with respect to the inputs that it receives.

Consider an analogy. Suppose you have math homework to do, but you left your calculator at home. A more lazy-minded student might procrastinate, thinking that they would only start once they got home, but a more pragmatically-minded individual might simply start on the problems that don't require a calculator. The idea of staging is that we can reap efficiency benefits for certain problems when facing a scarcity of information, by simply doing computations that require the arguments that are at hand first. We thus make a distinction between functions that must have all of their arguments to do useful work, and those who do not.

For instance, take the addition function.

fun add (x : int, y : int) : int = x + y

This is not a function that we would categorize as being able to do "useful work" with a single one of its arguments. Even if we were to curry add, with only one int it can't really do anything but simply wait for the second argument. In this case, the efficiency benefits are marginal at best. The computations are somewhat "dependent" - we need access to both arguments in order to do anything meaningful.

Consider a more contrived example:

(* contrived : int -> int -> int *)
fun contrived x y =
    let
        val z = horrible_computation x (* this takes 3 years *)
    in
        y + z
    end

The function contrived takes in two arguments x and y. It then performs some transformation on x using the function horrible_computation (which takes 3 years to run, unfortunately), and then adds y to the result of that transformation z.

Suppose that we are interested in computing the results of contrived 4 2 and contrived 4 5, for no reason other than because we can. Then, clearly evaluation of those two expressions will take 3 years each (per horrible_computation's horrible computational nature) - totalling up to six years! This is far too long, and we want to do better.

One thing that we note is that almost all of the work that we expended in computing contrived 4 2 and contrived 4 5 was in evaluating horrible_computation 4. This computation took us 3 years, but we repeated it twice! In both of queries we made, we had to compute the exact same thing, which led to major inefficiencies. The rest of the work of contrived was negligible compared to the overhead of horrible_computation. It seems that we should be able to achieve better results.

Consider the following definition instead:

(* contrivedStaged : int -> int -> int *)
fun contrivedStaged x =
    let
        val z = horrible_computation x (* this still takes 3 years *)
    in
        fn y => y + z
    end

Now, we have staged contrived. contrivedStaged still has the same type as contrived, but it behaves slightly differently. It is not too difficult to see that contrived is extensionally equivalent to contrivedStaged, but we have made a slightly more optimal change with regards to SML's semantics.

Now, instead of waiting for the second argument y to begin executing horrible_computation x, contrivedStaged does so immediately after receiving x. This is clearly better - there was no point to wait for y in the first place, since horrible_computation does not depend on it. So now we can execute the following code fragment:

val intermediate = contrivedStaged 4
val ans1 = intermediate 2
val ans2 = intermediate 5
(* takes 3 years in total *)

We can do this because contrivedStaged 4 computes the result of horrible_computation 4, and then stores the result in the closure of the function that it returns. This means that, in the scope of intermediate, it contains the answer that was common to both of the expressions we wrote earlier. Now, we can execute the step of y + z (which is nearly instantaneous), cutting our runtime in half. Now, we can obtain the result of contrived 4 2 and contrived 4 5 in only 3 years (though to evaluate both of those expressions themselves would still take 6 years!).

It is important to note that currying is not the same thing as staging. contrived was curried, but not staged optimally - we made a change that, even though it had the same type, let us structure our queries in a more optimal manner. This is an important idea with many applications, such as when building up a data structure to make queries to, or in graphics when a lot of work must first be done to preprocess the given data.

Common HOFs and Partial Evaluation

By Brandon Wu, June 2020

In this section, we will explore a number of common higher-order functions that we will make use of. These higher-order functions embody design patterns that are common in programming, and represent a generalized notion of a large space of potential functions that can be made to solve a myriad of problems. This section focuses on understanding the importance that higher-order functions offer us through increasing abstraction, as well as modularity in the structure of our code.

Designing Higher-Order Functions

Suppose that we are interested in incrementing all the numbers in a list by one.

fun incList [] = []
  | incList (x::xs) = (x + 1) :: incList xs

This is not too bad to do - we simply need to increment each individual element, and then simply cons it back on recursively. Suppose further that, as the arithmetically-minded scholars that we are, we are also interested in negating the sign of each element in a list.

fun negateList [] = []
  | negateList (x::xs) = (~x) :: negateList xs

These two look quite similar, which we begin to find unnerving. Hoping to get away from what is surely a wild coincidence, we try and write a function to take a list of strings and append the string "\n" to them (which denotes the newline character).

fun newLineList [] = []
  | newLineList (x::xs) = (x ^ "\n") :: newLineList xs

We can't seem to escape!

Map

We have seen that these functions we have written all look very similar - they perform a function on each element of a list, then cons the result back on to the recursive call so as to perform that operation on every element of the list. This is a common design pattern, or template for a function that we want to do a particular thing. For each specific operation we want to perform on a list, it seems we would have to write the same sort of function each time, only changing the operation that is performed at the recursive call. To avoid all that work, we will make use of higher-order functions.

(* map : ('a -> 'b) -> 'a list -> 'b list *)
fun map f [] = []
  | map f (x::xs) = (f x) :: map f xs

map thus both takes in a function (in this instance, the function f) and is curried, meaning that it itself returns a function. map takes in the function that you wish to apply to the entire list, and then the list, and then returns the result of applying that function to every element in the list. In other words, map f [x_1, ..., x_n] ==> [f x_1, ..., f x_n].

We also note that map is polymorphic, as we learned earlier, and the function it takes in has type 'a -> 'b, which is the most general function type. This means that the function we pass in can describe any kind of transformation on some element, which grants us a great deal of versatility.

map describes a pattern, or a family of functions that all follow the same sort of structure. In the same way that 'a is a variable ranging over all types, where we can describe each type as an instance of 'a, we can describe functions like incList, negateList, and newLineList as being sort of "instances" of map. Specifically, we have that incList is extensionally equivalent to map (fn x => x + 1), negateList is extensionally equivalent to map ~, and newLineList is extensionally equivalent to map (fn x => x ^ "\n").

We will now explore some more types of common patterns.

Filter

Quite often, we have a collection of objects and are interested in only some of them - we want to select those that possess the desired property, and get rid of those that do not. In other words, we want to filter the list into only those such elements. The property that we desire, however, could be anything. All we need to be able to do is say "yes" or "no" at any given item,

This is embodied in the implementation of filter, which abstracts away the specific property in question.

(* filter : ('a -> bool) -> 'a list -> 'a list *)
fun filter p [] = []
  | filter p (x::xs) = if p x then x :: filter p xs
                              else filter p xs

We describe the function p (of type 'a -> bool) as a predicate function, or alternatively indicator function, which simply returns true on those "yes"-cases and no on "no"-cases. Seen in this way, filter does something very similar to map, where it takes in the function it needs to apply to the elements of the list. In the case where the predicate holds, the element is kept, otherwise the element is discarded.

We could, for instance, obtain all the even integers in a list L : int list by writing the expression filter (fn x => x mod 2 = 0) L.

Fold

Map is very useful for performing some kind of transformation on a bulk group of data, however it retains the "structure" of the list. It maintains the elements in the same order as they were inputted, and simply transforms each piecewise to produce a final answer. Sometimes, we are interested in simply achieving a final result from a collection of data - not another collection of data itself. This describes a very common pattern known as folding.

(* foldl : ('a * 'b -> 'b) -> 'b -> 'a list -> 'b *)
fun foldl g z [] = z
  | foldl g z (x::xs) = foldl g (g(x, z)) xs

(* foldr : ('a * 'b -> 'b) -> 'b -> 'a list -> 'b *)
fun foldr g z [] = z
  | foldr g z (x::xs) = g(x, foldr g z xs)

More specifically, foldl and foldr both describe two ways of combining the elements in a list, given a function g. The role of z is that of a "base case" in our accumulated value, so that we have an initial point to start from when using the function g. The result of foldl g z [x_1, ..., x_n] is to evaluate to f(x_n, ..., f(x_2, f(x_1, z))...), and the result of foldr g z [x_1, ..., x_n] is to evaluate to f(x_1, ..., f(x_n-1, f(x_n, z))...). We are thus choosing whether we want to fold from the left of the list or the right.

NOTE: One way to remember which way that that each respective fold goes is to identify the corresponding side (left or right) as being the side of the most deeply nested element in the functions. As such, since x_1 is the most leftmost element, foldl has f(x_1, z) as its innermost component, whereas since x_n is the most rightmost element, foldr has f(x_n, z).

Use cases for fold include if you wanted to turn a list of strings into a single string, which you could accomplish with foldr (op^) "", or if you wanted to sum the elements of a list, which could be done with foldl (op+) 0. Note that in the case of summing a list, foldr would work too - this is because + is an associative function, meaning that it does not matter which order you evaluate it in (or in proper terms, \( f(x, f(y, z)) \cong f(f(x, y), z) \), for all \( x, y, z \)).

For many purposes, it will be the case that your z will be some identity value, such as 0 for summing a list, or the empty string for concatenating all the strings in a list. This does not always have to be the case. One of the strengths of the implementation is that we can specify what our z is, and tailor that to our needs. For instance, if we wanted to append a "\n" to our string concatenation, we could use foldr (op^) "\n". Fold offers us a great deal of flexibility with choosing how we want to reduce a list.

It is somewhat important to note the type of the function g here. It has type 'a * 'b -> 'b, where 'a is the type of the elements in the list that we are folding, and 'b is the type of the initial accumulator and output. It is useful to think of this 'b type as the type of the fold function's "accumulator", or the information that it stores as it proceeds along its computation. In the case of foldl, this accumulator at a given point along the list is simply the folded value of the all the elements to the left - and in foldr, it is the folded value of all the elements to the right. The polymorphic nature of this accumulator becomes a major strength, as we can "carry along" any kind of information that we like, so long as we define how it changes when we receive a new 'a-typed element from the list that we are folding.

So, for instance, the accumulator in foldl (op+) 0 is simply the sum of all the elements to the left of any given point. The accumulator of foldr (op^) "" is the concatenation of all of the strings to the right of a given point (which I hope makes apparent why foldr is the right fold for the task, as opposed to foldl!).

Compose

One of the major examples that we used to motivate totality was that of function composition, the classic example being \( f(g(x)) \), for some functions \( f \) and \( g \). This is a very common idea, where we have some form of data that we would like to put through a series of transformations. If our transformations are inherently disparate (such as being bound to identifiers of different functions), we may have to write code that looks like f1 (f2 (f3 (f4 (f5 x)))). However, this can only happen if we already have access to the element x. So then, what happens if we want to give a name to the process of applying f5, then f4, then f3, then f2, and then f1?

We could, of course, write the lambda expression fn x => f1 (f2 (f3 (f4 (f5 x)))), however that still can be rather ugly. There is an entire style of programming (named point-free, or tacit programming) that tries to eliminate the deliberate identification of the arguments to functions, instead making use of combinators. In a similar flavor, we would like to eliminate the explicit need to construct the lambda expression that takes in the input x. We might then call back to another common mathematical operator, that being of function composition, or o.

infix o
(* o : ('b -> 'c) -> ('a -> 'b) -> ('a -> 'c) *)
fun f o g = fn x => f (g x)

Note that the types are constrained to permit the "compatibility" of the functions f and g. The input type of f can be general, as well as the output type of g, but the output type of g must match the input type of f. In this way, we can "string along" a series of functions in order to produce a single function that performs the "pipeline" of transformations that we desire.

So, for instance, we could write the function that, given a tree of integers, sums all of the elements in the tree by simply writing sum o inord. We could, of course, simply write treeSum, however this idea is generalizable to more complicated sequences of transformations.

Partial Evaluation and Modularity

At this point, we have seen several examples of common higher-order functions, as well as potential use cases. These use cases often look nothing alike, but they all share a fundamental similarity in their structure, which is specified by the given higher-order function.

A key strength of higher-order functions lies in partial evaluation, where we can use higher-order functions to further derive other functions (and possibly higher-order functions, themselves). It is fine for, in the case of finding the sum of a single list L, to simply evaluate map (op+) L, but in the general case it is a strength that we can bind the function map (op+) to the name sum. This comes in handy especially if we want to sum over many lists, so that we don't continuously have to compute the result of map (op+) (though it has negligible computational cost, admittedly).

Seen in this way, it is as if higher-order functions are at the root of a large tree of potential functions, where each node in the tree is an increasingly-specialized function, until we arrive at some specific use case. This makes higher-order functions modular, as we can simply "mix-and-match" the arguments to HOFs such as map or fold until we arrive at the specific tools that we need.

This is only an example of the way that abstraction and modularity grant us strength in programming. Through abstracting away even the specific operations that programs carry out, we can "capture" a large amount of potential functions that we may write.

Control Flow

Programs inherently are composed of a series of steps. Depending on the context, syntax, and semantics of the programming language in general, we arrive at a certain prescribed order in which instructions are executed (at least for sequential programs). It is often the case that, according to small differences in state or environment, control may be affected, causing different instructions to be executed. In this chapter, we will discuss two control flow constructs, namely continuations and exceptions, that can allow us to more easily and cleanly write functions that need to have complex control flow behavior, or in other words, complicated decision-making.

Continuation Passing Style

By Brandon Wu, June 2020

We have seen how we can write functions that are tail recursive, in that they only make recursive calls as tail calls, where the recursive calls is the last thing that the function does (i.e. there is no deferred work). This commonly was realized by implementing the tail-recursive function with an accumulator, which simply stored the intermediate values that were computed. In this section, we will explore an concept known as continuation-passing style, which sees the use of functions as accumulators, which lets us use more explicit logic when encoding the control flow of our programs, as well as the intermediate results of our computations.

Continuation Passing Style: The Idea

Consider the computation of (2 + 3) * 4.

Clearly, there is an ordering to how we should evaluate this. We should sum 2 and 3 first, then take the result of that and multiply it by 4. There is kind of a catch here, that is quickly glossed over by our human brains - we refer to "the result of that" rather casually. We haven't explicitly named it, but we nonetheless make an appeal to intuition to get our point across.

How else might we represent this computation? Well, we could use lambda expressions, and then use the power of function application to compute the result. Then, we might obtain that this is akin to evaluating (fn res => res * 4) (2 + 3). This would be a more direct translation of the idea of "add 2 and 3, then pass the result of that to 4". We note that, in the process, we have explicitly made clear what we mean by "the result of that" - it is now bound to a name, that being res.

We can take this one step further. If we think about it a bit more, we might want to consider starting at a "single" value, so that we don't have to consider the operation of 2 + 3 as one step. Then, we might instead write "take 2, add the previous result to 3, and then multiply the previous result by 4". Clearly, we have now made it deliberate that we are passing around a single value that we are performing operations on at each step. How would we write this as a lambda expression, however?

We might write (fn res => (fn res2 => res2 * 4) (res + 3)) 2 to encode the previous instructions. This essentially makes res the first "previous result", and then the result of 2 + 3 is res2, the second "previous result". Make sure you understand what is happening here - we are binding 2 to the identifier res, then binding the result of res + 3 to the identifier res2.

However, it is still on us to provide the value of 2. We to somehow encode the notion of the computation of (2 + 3) * 4, not necessarily evaluating the expression ourselves. We know that placing an expression within the confines of a lambda expression will "freeze" the computation, causing it to only resume once the lambda expression is given an input, so what we can do is simply do the same with a trivial input. Our trivial input here will be (), or unit, since we always have access to it, and there is only one value of type unit.

So somehow, we can encode the idea of this expression with (fn () => (fn res => (fn res2 => res2 * 4) (res + 3)) 2). Seen in this way, we have somehow encoded the desired expression not by actually executing it, but forming some large function a priori that essentially does the same thing. We thus form a correspondence between evaluating an expression by carrying out each step in real time and by writing out the steps to be done at a later time. We hope to show that two are really equivalent - this will be important to understand for later.

Continuation Passing Style: A Case Study

We will now discuss continuation passing style more explicitly.

It is hopefully clear that the previous example of (fn () => (fn res => (fn res2 => res2 * 4) (res + 3)) 2) encodes, in some form, the idea of the computation of the expression (2 + 3) * 4. Somewhat key, however, is that computation does not execute until we actually feed a unit to it. It is important to note the distinction between doing something and writing the instructions to do something. A continuation can be thought of as a blueprint, or a contingency plan. We will expand more on what we mean by this shortly, but a continuation essentially represents what instructions need to be executed. This is a very powerful idea, because programs can continuously alter continuations by adding more instructions or even executing different continuations in order to determine what computations ultimately need to be executed.

We will first consider a simple example before going into the definition.

(* factCPS : int -> (int -> 'a) -> 'a *)
(* REQUIRES: true *)
(* ENSURES: factCPS n k ~= k (n!) *)
fun factCPS 0 k = k 1
  | factCPS n k = factCPS (n-1) (fn res => k (n * res))

Here, factCPS takes in two arguments - one is the counter for the factorial function (as normal), and the other is the continuation. In this case, the continuation is a function from int -> 'a. The idea here is that the continuation should represent what work is left to do. It is left polymorphic, however, in order to give the user control over what they want the function to do. In order to compute the factorial of n directly, one could just evaluate factCPS n Fn.id, however we grant more versatility to the user in that they are not just constrained to computing the factorial. If one wanted a textual representation of the factorial of n, they could evaluate factCPS n Int.toString, for instance. This way, we get additional versatility out of our implementation.

We will now consider a trace of factCPS 3 Int.toString to fully understand the mechanism by which it works.

CPS Trace
Fig 1. Code trace of the evaluation of `factCPS 3 Int.toString`

As we can see, this code trace does correctly result in Int.toString 6, which is our desired result. Of particular interest to our analysis is the continuation of the function, which seems to grow with every line through factCPS's recursion, until ultimately being reduced down step-wise until it yields our final result.

The colors in the image denote the difference between the current value of k and the new, constructed k. For instance, k is originally Int.toString (which is the blue-colored text), however it is eventually wrapped in (fn res => ... (3 * res)), which is the RHS of the call to factCPS 3 Int.toString. Thus, the "inner" k in the third line is in orange, to signify that it is in fact the same as the entire continuation from the previous line (also in orange), which is the "previous" k. Seen in this way, all that each recursive call seems to be doing is appending a layer to the continuation, while the inside remains the same.

NOTE: The definition of factCPS says that the input to each lambda expression should be named res. In order to make understanding clearer and avoid namespace collisions, we have opted to name it res, res2, and res3, on each recursive call to factCPS. Note that this renaming does not affect the evaluation of factCPS 3 Int.toString and does keep it exactly equivalent to how it is actually evaluated [1] .

So hopefully now we are convinced of factCPS's correctness [2] . What might not be evident, however, is why.

Recall our previous metaphor with regards to writing down instructions. It is hopefully not too difficult to see that this large lambda expression that we are constructing is akin to writing down instructions - we specify the operations that should occur when it is ultimately "collapsed" (by being given an input value), but nothing actually occurs until then. It merely encodes that information in the meantime. The next diagram will attempt to more specifically show this relationship between the "instructions" and how it arises from the definition of factCPS:

Instructions
Fig 2. Relationship between an arbitrary recursive call of `factCPS` and the "instructions" that it writes down.

The middle of this image is supposed to denote the "instructions" that you might write if you were to codify the algorithm to determine the factorial of \( n \). Note that these instructions are actually read from bottom to top - thus, we start with 1 and then work out way up multiplying until we reach \( n \), which presumably should give us the actual factorial.

Now consider if we were at some arbitrary point of execution of the expression factCPS n initK, for some arbitrary initK : int -> 'a. That is, suppose that factCPS n initK has reduced down to factCPS i k', for some other k (which is the result of modifying the continuation throughout the recursive calls of factCPS until now. Then, we should see that the form of k should look something like (fn res => (fn res2 => ... (fn resn => initK (n * resn)) ... ((i + 2) * res)) ((i + 1) * res)). That is, it exactly captures the idea of the instructions in orange - it covers the multiplication of all of the terms from \( i+1 \) to \( n \).

What is the action of the recursive call at factCPS i k? Well, clearly it should reduce to factCPS (i-1) (fn res => k (i * res)) - that is, it wraps k in the lambda expression (fn res => k (i * res)), which is just the "instruction" to multiply the result by i, which exactly corresponds to the instruction in blue.

How do we compute the rest of the factorial function? Everything seems correct so far, but that is all that we will have in our accumulation of k. The rest of the instructions are exactly corresponding to the recursive call - to factCPS i itself. factCPS i, as the recursive call, will continue to go and compute the factorial all the way down to 0. Thus, even though we have not written them down yet, we can use the "recursive leap of faith" to assume that factCPS i k will behave properly, and write down the instructions for multiplying i-1 through 1 properly, which will result in the final, correct result.

NOTE: An equivalent, but also very important way to view CPS functions is that recursive calls pass their result to their continuation. For instance, as we have defined, factCPS n k should be extensionally equivalent to k (fact n), or in other words, factCPS n k will be the same as passing the actual factorial of n to k. This means that when we are writing our function, we can make that assumption - factCPS (n-1) (fn res => k (n * res)) should pass the result of \( (n-1)! \) to (fn res => k (n * res)). This is equivalent to, in our "instructions" analogy, saying that factCPS i k should faithfully execute all the instructions of multiplying from 1 to i - that is, factCPS i k should pass the result of \( i! \) to k.

So now, we can sort of inductively see how factCPS writes down the entire page of instructions, which should mean that it is correct when we execute it. This is not an inductive proof in itself, but this should give the intuition for why it does work.

An additional way to think about CPS functions is that they behave similarly to a stack. This is because, as we have seen, we are continuously wrapping lambda expressions with other lambda expressions - we can only "pop" off a lambda expression by evaluating it with an argument, or "push" on more instructions by wrapping our continuation in more lambda expressions. We cannot access the ones inside. As such, we could visualize the evaluation of factCPS 3 k as the following diagram:

CPS Stack
Fig 3. A visualization of the "stack" of instructions created by evaluating `factCPS 3 k`.

We would build this stack up from the bottom by first putting our k on (as our "original" k), then wrapping it in more lambda expressions as we go along the recursive calls (so, for instance, the orange brick is added by factCPS 3 k', and the red brick is added by factCPS 2 k'', for their own respective continuations k' and k''). Then, once we are at the end, our "instruction stack" looks like the one currently pictured. At that point, we have nothing left to do but execute the stack of instructions with an initial value of 1, which will cause the bricks to start being popped off one by one, and then ultimately result in the value of 6 being applied to k.

Another, equivalent way to view continuations is as donuts.

CPS Donut
Fig 4. An artist's rendition of the "CPS Donut" of instructions created by evaluating `factCPS 3 k` (colorized, 2020).

As you can see, we cannot access the inner layers of the CPS Donut without first biting through the outer layers (corresponding to our evaluation of the outer layers first). One can only imagine what it would taste like in real life.

NOTE: It is not important to be able to draw these examples, or parrot them verbatim. They are merely here to try and provide some intuition as to what is happening with CPS. It is very important to be able to understand why it is that CPS functions work, which may be rather daunting and hard-to-grasp at first.

Footnotes

[1]: In fact, it is alpha equivalent!

[2]: Though perhaps we should not be, until we write a full inductive proof of correctness!

Continuation Passing Style: The Definition

We are now ready to attempt a definition of continuation passing style.

[Continuation] A continuation is a function that specifies what is supposed to be done with the result of a computation.

[Continuation Passing Style] A function is said to be written in continuation passing style if it satisfies the following conditions:

  1. It takes in and uses continuation(s).

  2. It calls functions with continuations (including itself) as tail calls.

  3. It can only call its continuations in tail calls.

The key characteristic of CPS functions is that they are generally written with the goal of passing their results to their continuations. These continuations specify what computations should occur next.

First, take a moment to assure yourself that the implementation of factCPS that we have so deeply studied is, in fact, in continuation passing style [3] .

fun factCPS 0 k = k 1
  | factCPS n k = factCPS (n-1) (fn res => k (n * res))

As we have seen, clearly k is factCPS's continuation, and it does call it in tail calls (as well as itself). This seems consistent with our definition.

An important corollary of this definition is that a CPS function cannot case on a recursive call to itself. So, for instance, making a recursive call to a CPS function to see if it succeeds, then taking some action based on that premise is illegal. You may not fully understand what that means at the present, but we will explore this idea more in the future.

A somewhat interesting note is that every direct-style (which is how we will term the kinds of functions that we have written until now) function admits a continuation passing style implementation. This means that continuation passing style is nothing arcane, but it is merely a different way of viewing computation. Functions written in continuation passing style can have significant performance benefits when compared to their direct style counterparts, so there are reasons to use continuation passing style other than just for the sake of it.

Footnotes

[3]: Otherwise, you should be quite concerned about the competency of the author, and you would probably be better off reading a different help site.

Continuation Passing Style: A Case Study v2.0

We will now explore an example of a CPS function that makes use of some concept of limited backtracking, or checkpointing. Somewhat key to this example is that it writes down not just one instruction at each recursive call, but multiple.

fun treeSumCPS Empty k = k 0
  | treeSumCPS (Node (L, x, R)) k = 
  treeSumCPS L (fn leftSum => treeSumCPS R (fn rightSum => k (leftSum + x + rightSum))

This function computes the sum of the elements in an int tree. The recursive case has a slightly more intimidating-looking continuation, however we can view it as simply a case of the continuation being wrapped twice.

Consider how we would normally want to approach this problem. In a typical treeSum, we might compute both the left and right sums, and then simply add the results to x. This suffices, and follows rather intuitively, but in CPS we must make our control flow explicit. In this manner, we must be very specific about what we should do and in which order. To that end, we fix a direction to visit first (in this case, left, though it does not matter), and then compute the sum of that direction, with the promise that we will eventually use the result of the left side to compute the final result.

To visualize what is happening, consider the following tree.

Tree to be summed
Fig 5. A tree of ints to be summed, whose node contents are conveniently enumerated according to visit order.

We will run through a mock simulation of the evaluation of treeSumCPS T k, where T is the tree pictured, and k is some arbitrary continuation. First, note that we will take the convention that "Tn" for some number n will denote the subtree rooted at the vertex n, and "Sn" will denote the sum of that subtree.

Firstly, we know that from treeSumCPS T k we should obtain treeSumCPS T2 (fn S2 => treeSumCPS T4 (fn S4 => k (S2 + 1 + S4)).

Phase 1
Fig 6. Phase 1 of evaluation of the tree T.

We can think of the individual calls to treeSumCPS on T2 and T4 as leaving "flags" on each arm of the edges coming from vertex 1 - denoting which node that we should visit next. Clearly, we visit T2 first, so the red brick corresponding to T2 is on top.

Our next move is to pop it off first, which will cause our expression to now have three bricks - two corresponding to the children of T3, and one corresponding to T4. We can visualize the next phase as the following:

Phase 2
Fig 7. Phase 2 of evaluation of the tree T. Note that two extra "bricks" have been added due to evaluation of treeSumCPS T2.

where the brick labelled treeSumCPS E corrresponds to the empty child of T2. Note that we have retained the blue flag at T4 and left its brick alone, since for all intents and purposes we have not yet "touched" it - we have not evaluated our stack to that point. Note that due to corresponding to the Empty case, the purple brick will not add any new bricks, but merely defer to the next brick on the stack - the orange brick. Thus, the next step looks like the following:

Phase 3
Fig 8. Phase 3 of evaluation of the tree T. Note that this follows from evaluating both of the top bricks from the previous step.

In this step, the green and light blue bricks again correspond to empty trees, so they simply dissolve without generating more instructions. As such, we reduce to the following diagram:

Phase 4
Fig 9. Phase 4 of evaluation of the tree T. Note that, after finishing our evaluation of all of the bricks of the left side, we return to our long-neglected "checkpoint" on the right side of the tree..

We see in this step that, as the figure caption says, we have returned to the right hand side after finishing all evaluation on the left hand side of the tree. Thus, our construction was correct - we placed the blue brick onto the stack at the very beginning, then proceeded to forget about it until now. This works in our favor, however, as we only return to it once we have finished with everything on the left. One more step of evaluation yields:

Phase 5
Fig 10. Phase 5 of evaluation of the tree T.

Thus, all we have left are empty bricks, so we are very close to termination.

This demonstration was far from a rigorous treatment, and also omitted any mention of the actual value being passed into each continuation - this can be inductively assumed to be the proper sum at any given brick. We invite the reader to conduct any detailed analysis on their own, similarly to the treatment of factCPS, in order to truly grasp how the different subtree sums are computed and passed around.

The main point in this example, however, was to demonstrate how even a slightly more complicated function can result in elegant, powerful control-flow behavior. By simply wrapping our continuation twice, we ensured that we could set up a "checkpointing" system, where we could set "flags" to jump back to once finishing a certain computation. This is an important idea to cognize with continuation passing style.

In the next example, we will explore how we can set up more complicated arrangements of control flow through usage of two continuations.

Continuation Passing Style: A Case Study in Success and Failure

Duality seems to permeate computational problems. When trying to write programs, we often come to cases where we need to make a choice between two extremes - on or off, keep or don't, left or right, 1 or 0. With the capacity to make choices, however, comes the possibility of making the wrong choice. In such a case, we might want the ability to remember the choice we made, and take the other one, or backtrack.

[Backtracking] Backtracking is a strategy employed by algorithms when trying to find all the possible "options" or "possibilities", so as to locate the "best" solution, or alternatively just one feasible solution. Problems that admit backtracking solutions include the N-queens puzzle, constraint satisfaction problems, and Sudoku.

At first, backtracking might seem like it is akin to what was done in the previous example, with our idea of "checkpointing", except that we never really "reset" our state. When we set checkpoints to go back to in the treeSumCPS example, we always did so while retaining the sum of the left side, so we never lost information. In a backtracking problem, we may need to throw away everything that we've done in the meantime and go to a different solution entirely.

In this example, we will analyze a classic optimization problem, and how it admits a CPS solution.

[Knapsack Problem - Decision Version] The knapsack problem is a resource allocation problem which traditionally concerns a person with a knapsack that can only contain a certain total weight of items. The person has to choose between a collection of items (all of differing weight and value) so as to maximize the amount of value collected, while being below the maximum weight limit of the knapsack. Note that items can be taken more than once.

In this chapter, we will be concerned specifically with the decision problem version of the knapsack problem, which, instead of asking for a maximizing assignment, instead asks if there exists an assignment that can achieve a certain threshold of value.

We will now write a continuation passing style function that solves the knapsack problem.

type weight = int
type value = int

(* knapsackCPS : 
 *            (value * weight) list -> 
 *            value -> 
 *            weight -> 
 *            ((value * weight) list -> 'a) -> 
 *            (unit -> 'a) -> 
 *            'a 
 * REQUIRES: The weights and values of the elements in L are strictly positive.
 * ENSURES: knapsackCPS L minVal maxWeight sc fc ~= sc L' for some L' that only
 * contains elements of L, such that the total value of L' >= minVal and the 
 * total weight of L' <= maxWeight, if such an L' exists. If no such L' exists, 
 * then it should be equivalent to fc ().
 *)

fun knapsackCPS 
  (L : (value * weight) list) 
  (minVal : value) 
  (maxWeight : weight) 
  (sc : (value * weight) list -> 'a) 
  (fc : unit -> 'a)
  : 'a =
  case L of
    [] => if minVal <= 0 andalso maxWeight >= 0 then sc [] 
                                                else k ()
  | (v, w)::xs => if maxWeight < 0 then k ()
                                   else
    knapsackCPS ((v, w)::xs) (minVal - v) (maxWeight - w) 
    (fn L' => sc ((v, w)::L')) (fn () => knapsackCPS xs minVal maxWeight sc fc)

We see that knapsackCPS takes in a list of items, each represented as a value * weight tuple, where value and weight are both really just aliases for int. It also takes in a minimum threshold for the knapsack's value, minVal, and a maximum weight of items to be taken maxWeight. Of particular interest, however, are the parameters sc and fc - denoting what we call the success and failure continuations. The goal of our function is to ultimately find a list L' that contains elements that are also in L (with duplicates allowed). This corresponds to choosing how many of each item in the allowed collection to pick.

If such a list exists, then we should return sc L'. Otherwise, if there is no such list, we should return fc ().

Rather immediately, this should seem as a kind of different problem. There isn't really a better algorithm than brute forcing possibilities, but if we try a possibility and it turns out to be wrong, we want to have the option to be able to backtrack and try something else entirely. We will demonstrate how this is realized in the algorithm in the next part.

case L of
    [] => if minVal <= 0 andalso maxWeight >= 0 then sc [] 
                                                else k ()

For the base case, however, we know that if we are given no items whatsoever, then we cannot place anything in our knapsack. Thus, we have no choice but to have a value and weight of 0. As such, if we know that our minimum value is at most 0 and our maximum weight is at least 0, then a valid list value for L' is just the empty list, so we can call our success continuation in sc []. Otherwise, we must call the failure continuation, as the problem is not solvable.

Note that, since we plan to write this function recursively, if we were given an initial list that was non-empty, we may eventually recurse down to the empty case. It may then seem like a concern that we are calling sc [], as we might actually call sc on a list that contains elements. Note that this is not a concern - by the structure of CPS functions, if we were to recurse down to the base case, our "promise" is that the success continuation sc that we enter the base case with is not the "original" sc that we were passed - by this point, it should have accumulated a great deal of "information" and thus become more advanced than simply returning sc [], for the original sc. For the base case, it is sufficient to simply solve it from the perspective of having been originally given [].

In the recursive case, we now have to think about how we might solve the knapsack problem given that we have a non-empty collection of items to take. Right off the bat, we can say that if our maxWeight < 0, then this problem is unsolvable, since we cannot possibly have a negative total weight (our weights in this problem are strictly positive), so in that case we would simply call our failure continuation.

Otherwise, however, we now need to actually think about what to do. Oftentimes, it is very helpful reduce the problem to making a binary choice, and then simply explore the consequences of doing so from there. In this case, we can imagine our choice to be whether to put the first element of the list into the knapsack, or to not.

| (v, w)::xs => if maxWeight < 0 then k ()
                                   else
    knapsackCPS ((v, w)::xs) (minVal - v) (maxWeight - w) 
    (fn L' => sc ((v, w)::L')) (fn () => knapsackCPS xs minVal maxWeight sc fc)

What does our recursive call for putting the first element (v, w) into the knapsack look like? It takes the form of knapsackCPS ((v, w)::xs) (minVal - v) (maxWeight - w) sc' fc', for some sc' and fc' that we will determine later. We keep the list the same, to account for the fact that we can have duplicates - in the next step, we want to reconsider whether we want to put (v, w) on again. If we commit to putting (v, w) in the knapsack, however, we need to somehow encode the fact that our value has gone up by v, and our knapsack's weight has gone up by w. This is achieved by setting the new minimum value to be minVal - v, and the new max weight to be maxWeight - w - we simply lower our thresholds, which is functionally the same.

What should our continuations be? Remember that by the recursive leap of faith, we can assume a kind of "promise" of this recursive call - that is, we can assume that knapsackCPS ((v, w)::xs) (minVal - v) (maxWeight - w) sc' fc' obeys the same ENSURES clause as was written above, for its own sc' and fc'. We now should define what to do in the case that either is called.

We are, at this point, not actually at the stage where we know if sc' or fc' is going to be called, or on what, if at all. The power in CPS comes from the fact that this does not matter, and we can write our algorithm despite that. Earlier, we discussed the concept of continuations as contingency plans. In this function, we are going to be using that strategy to full effect.

Think of sc' and fc' as contingency plans for the success and failure case, respectively. If we were to succeed on this recursive call, what should we do? If we were to fail, what should we do? Even though we have no idea what the input to the continuation is, the fact that it is a function allows us (inside the scope of the body of sc') to assume we have access to it.

As such, the first thing we may write for sc' is fn L' => .... Now, we must fill in the ellipses. If sc' is called, we know that it must be called with an appropriate L' that satisfies the ENSURES conditions. Recall that this recursive call emerged in the first place from our choice - to place (v, w) inside of the knapsack. Thus, if the call to knapsackCPS succeeds, we know that placing (v, w) inside the knapsack must be feasible, by extension. So then we know it makes sense to write fn L' => sc ((v, w)::L') for sc', which intuitively means "take the answer (knapsack) to the recursive call and then put (v, w) in it, then call success on it". Thus, we have written down a contingency plan for what we should do if, at some point in the future, our success continuation is called.

What about fc'? What should we do if we fail? Well, if our recursive call to knapsackCPS does not succeed, that must mean that our choice to place (v, w) inside of the knapsack was wrong. In that case, we want to make the other choice - to not put (v, w) inside the knapsack. Thus, we write fn () => knapsackCPS xs minVal maxWeight sc fc, which has the exact same parameters as what we were initially passed, except we have gotten rid of (v, w) (since we know to put it in the knapsack is a mistake). Our sc and fc remain the same, since on a sucess or fail to this call, we just do whatever we would normally do on a success or fail.

This, in only a few lines (and generous whitespace), we have written an algorithm for a backtracking knapsack problem solver. While it looks intimidating at first, the overall algorithm reduces down to only a few simple steps, which is written in an elegant and explicit manner due to continuation passing style.

Conclusions

In this section, we explored the idea of continuation passing style, which lets us phrase the control flow of our functions in a more explicit manner, granting us more versatility in our implementation. We saw how CPS functions can be viewed as simply a more advanced form of accumulation, as well as how having multiple continuations (corresponding to success and failure cases) allows us to explore expansive trees of binary choices to find solutions.

Exceptions

By Brandon Wu, June 2020

So far, we have seen how we can manipulate the constructs of SML to create unique control flow behavior in the form of continuation passing style. In this section, we will discuss exceptions, which are themselves a builtin feature of the language. With exceptions, we can cover cases where continuing to evaluate an exprerssion does not make sense or is ill-defined at run-time.

Built-In Exceptions

We have seen how SML interprets computation as evaluation, and how we can reduce our entire notion of program execution to application of simple reduction rules. Sometimes, however, we run into cases where attempting to apply some rule results in an error, or in some output that we cannot actually express. In such cases, it is necessary to actually halt evaluation with some manner of effect - this is the behavior that exceptions will introduce.

While we would like to be able to push as many of these errors as possible to compile time, it is not always the case that this is possible - this is usually when dealing with cases where computation is made infeasible by the values that are bound to identifiers, which cannot be determined a priori at compile time. As such, we have no way of telling beforehand if such errors will occur - forcing us to define some notion of a run-time error.

You have likely already encountered exceptions - the canonical example is of 1 div 0, which, when entered in the REPL, will result in an uncaught exception Div. This is because we cannot sensically evaluate 1 div 0 to some value - what integer could we possibly return? The other option, then, is to simply quit, and not even try to evaluate further. This is why division by zero results in an exception.

Another example is of the declaration val x::xs = []. This signifies an attempt at pattern matching, where we attempt to pattern match x::xs to []. Plainly, however, we cannot possibly carry this out - what values could we bind to x and xs to make this valid? Thus, we will also need to have some exception for an incorrect binding - which is appropriately named as the Bind exception.

Finally, in general we are interested in writing functions that are exhaustive, in that they are defined on all values of their input type. Even if a function is only meant to be called on values within a certain domain, it is often a good idea to be safe and cover all of the cases anyways. We call such a function that is not defined on certain inputs nonexhaustive, and defining them will generally result in a non-exhaustive match warning, though it will still be permitted.

Consider the function fn true => 1, which is plainly nonexhaustive, not covering the false case. Nonetheless, it is a function of type bool -> int that can be bound to an identifier. How, then, should we handle the case of attempting to evaluate (fn true => 1) false? This is a well-typed expression, causing it to fly under the radar of our compile-time tests. At run-time, then, we cannot evaluate this expression through function application - the function does not specify what it should return in this case! As such, we will simply raise a Match exception, signifying that the function's input was not able to match to any particular clause of the function. Note how this differs from Bind, which occurs as a result of attempting to produce a binding, not when attempting to apply a function.

Defining Exceptions: Basic

We have now seen the built-in exceptions that are automatically raised for certain prescribed use cases. Oftentimes, however, we are interested in our own specified use cases, meaning that we likely do not want to use the exceptions Div, Match, and Bind, which may be unrelated. In this case, we want to define our own exceptions.

The syntax for defining exceptions is as follows:

exception Error

This introduces the constructor named Error, which corresponds to the identically named exception Error. Exception constructors are first class, meaning that they are themselves values. The type of exception constructors is exn, so this line really introduces the value Error : exn. The type exn stands for "exception name", but it is also useful to think of it as standing for "extensible", since the type exn is extensible. This means that we can extend the values that populate exn with new constructors, like we did with Error.

The value Error is not, by itself, an exception, however we can use it to raise exceptions with the raise keyword. We can think of the raise keyword as being "like" a function of type exn -> 'a, in that it "takes in" a value of type exn and has type 'a. It is important to remember that raise is not a function really, though - it merely has similar behavior when used to raise exceptions, but it is not first class.

The polymorphic "return type" of the raise keyword is so that raising exceptions can be compatible with our type system. Suppose we want to write a factorial function that, instead of looping forever on negative inputs, raises an exception.

exception Negative

fun fact' 0 = 1
  | fact' n = 
      if n < 0 then raise Negative
               else n * fact' (n-1)

This code fragment should carry out our desired behavior. Consider the type of raise Negative - we would like to raise an exception, but we know that the expressions in both branches of an if then else expression must be the same type. In the positive case, this has type int, corresponding to just calculating the actual factorial. Therefore the negative case must also be int, though we also want to raise an exception. To be compatible with this, raise Negative must have type int.

We would not like raise Negative to have type int in general, however - this depends on our use case! We want raising exceptions to be able to just work, type-wise, since we know that it never returns a well-defined value anyways. As such, we define raising exceptions to have a polymorphic return type, so that it fits within our type system correctly, no matter the use case. This is also the reason why we can write raise Fail "Unimplemented" as the output of not-yet defined functions and still pass typechecking, no matter how complicated the function.

Exceptional Control Flow

At this point, we have seen how exceptions let us implement a very limited form of "control flow", in that we can stop the flow of control entirely - upon encountering a raised exception, computation ceases. This is rather rudimentary in terms of expressiveness - we can only create programs that forcibly terminate! In this section, however, we will explore the usage of handle, a keyword that allows us to have more sophisticated behavior with respect to programs deal with raised exceptions.

[handle] For expressions e : t, e1 : t ... en : t, and different values Exn1 : exn ... ExnN : exn, if the expression e raises the exception ExnI, then the expression

e handle Exn2 => e1
       | Exn2 => e3
       ...
       | ExnN => en

reduces to eI.

In other words, the handle keyword lets us case on the exception raised by an expression.

It is important to note that all the expressions e, e1, ... en have to be of the same type. Consider what would happen if they were not:

e handle Div => "Oh no!"

Let e be an expression of type int. Suppose that, in this case, e raises Div, so ostensibly this expression should reduce to "Oh no!". However, what would happen if Div was not raised? Then, we would have e, which is of type int.

We've violated type safety here. We cannot "sometimes" have an expression be one type and another time have it be another. We must have static type guarantees. As such, all the arms of a handle must agree, and additionally they must agree with the type of the expression being handled.

We say that an exception that is raised can either propagate up to the top level (in which case the program or expression simply results in an uncaught exception), or to the nearest handler. To clarify the meaning of "nearest", take the evaluation of the expression (1 + (1 div 0)) * 3 handle Div => 5, for example. We see that 1 div 0 raises the exception Div, so the inner expression is extensionally equivalent to (1 + raise Div) * 3. Then, applying this logic one more time, 1 + raise Div clearly should also raise Div, so we get that it is extensionally equivalent to raise Div * 3, which is then extensionally equivalent to raise Div. What we see is that this raised exception "propagates up" as it subsumes more and more expressions, until eventually it reaches a handler.

While we now see how we can handle different kinds of exceptions, we might want to make a more educated choice about what our next action should be. It might be the case that we raise an exception in some failed branch of the program, but we want to have more information about exactly what happened, or what the program was doing at the time. We will now discuss information-carrying exceptions, which are nothing other than an extension of our declarations of exceptions to being more akin to how we declare datatypes.

In a similar vein to how we can declare datatype Foo = Bar of int to denote that a value of type Foo is the constructor Bar wrapping a value of type int, we can declare values of type exn to also be constructors wrapping values. This takes the form:

exception Crash of int

which denotes that Crash 1 and Crash 5, among others, are values of type exn, and can thus be raised. Note that Crash thus has type int -> exn.

Concretely, we can "pattern match" on the data contained by the exception handler by doing something like the following:

exception Return of int

fun factException 0 = raise Return 1
  | factException n = factException (n - 1) handle (Return res) => raise Return
  (n - 1)

NOTE: It is not clear why anyone would want to define fact this way.

This example makes use of an exception, Return : int -> exn, which wraps the return value of fact. fact, at each step, simply raises an exception containing its return value, which (in a future recursive call) is handled, the value unwrapped (bound to res), then multiplied by the current value of n to generate the next value, which is simply raised again. This is very similar to a case expression - we simply pattern match on the raised exception's constructor using the handler (you can pattern match on exception constructors with case as well, though not raised exceptions). Thus, the behavior of factException n is to be extensionally equivalent to raise Return (fact n).

For an abstract idea of a potential use case, consider some recursive function f that carries out some sequence of calculations, with a potential for error. We might be interested in how many recursive calls such a function makes when it ultimately fails - however, if we were to return the number of recursive calls, we would constrain the return type to be int, or barring that, some datatype that could be either a valid result (say, Valid res) or a signal for failure, with a line number (say, Fail line). We might desire that on a fail, execution actually stops, however. We could then simply raise the exception Crash line, which, as a raised exception, has a polymorphic type. As such, exceptions allow us to propagate back information without altering types, which can be convenient for our purposes.

For a concrete example of using such exceptions, see the next section.

Exception Handling Style

In the previous section, we discussed how continuation passing style could be used to devise complicated control flow schemas, in some instances being based around the idea of a success and failure continuation, which could both potentially execute disjoint sets of instructions. With continuations, we can relate them to a common other construct in programming languages, that being a goto. With a goto, we abandon whatever we are currently in the process of doing in favor of something else. In this, we can see that continuations and exceptions share similar characteristics, of being able to just "stop" execution in favor of some other route.

Consider the knapsack example from the previous section. We will now implement a solution to the knapsack problem using exception handling style.

exception Failure

type weight = int
type value = int

(* knapsackEHS : 
 *            (value * weight) list -> 
 *            value -> 
 *            weight -> 
 *            ((value * weight) list -> 'a) -> 
 *            'a 
 * REQUIRES: The weights and values of the elements in L are strictly positive.
 * ENSURES: knapsackEHS L minVal maxWeight sc fc ~= sc L' for some L' that only
 * contains elements of L, such that the total value of L' >= minVal and the 
 * total weight of L' <= maxWeight, if such an L' exists. If no such L' exists, 
 * then it should be equivalent to raise Failure.
 *)

fun knapsackEHS 
  (L : (value * weight) list) 
  (minVal : value) 
  (maxWeight : weight) 
  (sc : (value * weight) list -> 'a) 
  : 'a =
  case L of
    [] => if minVal <= 0 andalso maxWeight >= 0 then sc [] 
                                                else raise Failure
  | (v, w)::xs => if maxWeight < 0 then raise Failure
                                   else
    knapsackEHS ((v, w)::xs) (minVal - v) (maxWeight - w)  (fn L' => sc ((v, w)::L')) 
    handle Failure => knapsackEHS xs minVal maxWeight sc

It should be apparent that this function shares very close similarities to knapsackCPS, with the exception [1] of omitting the failure continuation for raising the Failure exception. In fact, we can claim that knapsackCPS L minVal maxWeight sc fc ~= knapsackEHS L minVal maxWeight sc handle Failure => fc (), for all relevant values. Take a moment to assure yourself that this is the case. The code does not look very different, with the largest change being the recursive case, where the failure continuation has instead been offloaded to a handler.

Recall that we can think of the recursive call in the knapsack problem as a "choice" to "keep" or "not keep" the item at the head of the list. We said previously that, arbitrarily, we could commit to the choice of "keep", with a provision in the failure continuation to instead "not keep", should that failure continuation ever be executed. When evaluating the expression knapsackEHS ((v, w)::xs) (minVal - v) (maxWeight - w) (fn L' => sc ((v, w)::L')) [2] , we know that one of two things can happen - it can either succeed or fail. Now, however, our definition of failure is different - instead of calling its failure continuation, an instance of knapsackEHS which fails should instead raise Failure. Thus, it is exactly the right thing to do to do what we would ordinarily do upon a failure, should our call to knapsackEHS raise Failure.

Note, however, that in this implementation we put a slight amount more burden on the user, since the ill-defined behavior of this function now results in a raised Failure, instead of just invoking fc (), for some pre-defined fc that we input. This offers us the same advantages, however, since the return types of sc and fc in knapsackCPS must be the same. As such, if we want knapsackCPS to return some indicative value (without using an option type), we might not have an appropriate return value for the failure case. Thus, knapsackEHS might have the behavior we're looking for, since the type of raise Failure allows us to "unconstrain" the type of our success. In the general case, however, we will not make heavy usage of exception handling style, in favor of continuation passing style, which can be cleaner.

This is not the most committed that we could have made knapsackEHS, when converting to exception handling style - we could have also represented success continuations with a raised exception, an exception Success of (int * int) list. We will not cover such an implementation in this chapter, but we invite the reader to try it out.

Footnotes

[1]: We're funny. [2]: Say that five times fast.

Conclusions

In this chapter, we explored exceptions, which allow us to have quick transfers of control flow, albeit in a less "controlled" fashion than ways that we have seen in the past. The success of so-called exception handling style is heavily contingent on intelligent placement and consideration of handlers, which decide where control is transferred to. We also have seen that we have a way of passing information back through the raised exception, which allows us to have a more powerful manner of communication than just an indicator of failure. Exceptions ultimately allow us a robust and type-safe way to deal with run-time errors in our programs.

Modules

By Eunice Chen and Brandon Wu, December 2020

Sometimes, we are interested in developing large projects. Large codebases are oftentimes convoluted and impenetrable to outside scrutiny, being solely understandable by the original author. It is thus in our best interest to develop good coding practices - consisting of clean, commented code that communicates its cause clearly, as well as a modular structure that allows understanding what each part of the codebase should do. For instance, we would not want to simply dump all of our code into a single file with thousands of lines - instead, what we oftentimes may do is decompose it into constituent parts, each part of which has a certain purpose, such as parsing, maintaining certain data structures, or other purposes, depending on the use case.

In this chapter, we will discuss SML's module system, which allows for safe, powerful abstraction of code. We will discuss the power that data abstraction grants us as code designers, as well as the clean ways that we can compose them in order to encapsulate common design patterns.

Motivation

Oftentimes as programmers, we are tasked with writing code that matches a specification. This specification can vary in rigor and mathematical formality, but the general idea is that oftentimes it exists. When we are asked to write a function to compute the first \( n \) Fibonacci numbers, it does not necessarily matter how we go about implementing this function, so long as it exhibits the proper input-output behavior. In other words, we would like it so that all functions implementing a given specification have extensionally equivalent behavior to what we should expect them to do.

NOTE: Sometimes, there are constraints beyond simply being the same "mathematical function" - that is, defining the same outputs on the same inputs. Sometimes we are asked for the implementation of a function, running in only less than \( O(n^2) \) asymptotic time. Even in cases like these, however, there are always still superficial differences that our implementation allows for - for instance, when writing the Fibonacci function, it is unlikely that it should matter whether we calculate the value of \( f(n-1) \) or \( f(n-2) \) first.

This should not be an unfamiliar idea - this is exactly the concept of referential transparency, which says that we can swap "equals-for-equals" whenever discussion the equality of expressions. Under the eyes of the language, equal expressions are just that - equal, and there is no need to disambiguate them.

This idea, however, is rather limited in scope. What if we would like to deal with a function that depends upon another function? Then we cannot freely make the swap between functions, and substitute a function into a context where its dependencies do not exist. Indeed, we cannot perform such a substitution at all in the case where a function is in any way defined beyond its own function body itself.

We don't want to outlaw such practices, however - we simply need to make our abstraction boundaries more explicit. When we have a package or otherwise standalone bundle of software, we should be able to use all of its components that are muutally dependent on each other - not just make our distinctions at the function level. We will do so using SML's module system.

Modules: Basics

SML takes both the ideas of the specification and the implementation and provides a way of codifying both within the language itself. Of course, it is rather difficult to say "implement the list reversal function in \( O(n) \) asymptotic time in \( n \), the length of the list" in code[1], so SML will only deal with specifications at the type-level. That is, within SML itself, a specification for a function is simply a type for that function.

The term for a specification in SML is called a signature. Consider the following specification and implementation of a package for modular arithmetic. Note that the comments are optional.

signature MOD_ARITH =
sig
    (* ENSURES: mod_add n x y = (x + y) mod n *)
    val mod_add : int -> int -> int -> int

    (* ENSURES: mod_sub n x y = (x - y) mod n *)
    val mod_sub : int -> int -> int -> int

    (* ENSURES: mod_times n x y = (x * y) mod n *)
    val mod_times : int -> int -> int -> int
end

This signature defines three functions, all of which have type int -> int -> int -> int, which takes in the modulus of the operation, as well as the two operands to the operation to be performed. While we would like an implementation of this signature to also display the behavior outlined in the comments, realistically there is no way to verify that this is true (why?), so the best that we can do is ensure that an implementation provides functions with the correct type.

Note also that this syntax is slightly different than that you have seen before

  • we use val as if we were going to produce a val binding (with an associated type annotation), however we do not actually give it any definition. In truth, you cannot define a value in the signature itself, since it's impossible in general to check if two values are the same. You can, however, define a type within a signature (or leave it definitionless, as we will see later in the chapter).

An implementation of MOD_ARITH (called a structure, or module) might look as follows:

structure ModArith : MOD_ARITH =
struct
    fun mod_add n x y = (x + y) mod n

    fun mod_sub n x y = mod_add n x (~y)

    fun mod_times n x y =
        case Int.compare (x, 0) of
            LESS => mod_add n (mod_times n (x + 1) y) (~y)
          | EQUAL => 0
          | GREATER => mod_add n (mod_times n (x - 1) y) y
end

NOTE: By convention, structure names are usually capitalized, and signature names are usually all-capitals. Additionally, it is totally fine to declare a structure's values with fun instead of val, even if it says val in the signature, as fun is just shorthand for producing a function value binding anyways.

We will go over the precise meaning of the usage of the colon symbol in the above structure later in the chapter. For now, the meaning of this code snippet is to produce a structure ascribing to the MOD_ARITH signature, or in other words, implementing the MOD_ARITH specification. We see that it contains three declarations, those being the functions declared in the signature itself. A key note is that ascribing to a signature is akin to a contract - any structure ascribing a signature must implement all the requisite components described by the signature, or it will fail to ascribe, and result in a compile-time error (similar to an ill-typed program).

It is not, however, the case that a structure cannot provide more information than is strictly necessitated by the signature. Additional helper functions and value bindings can be freely instantiated within a structure without affecting ascription. Thus, honoring a contract only entails satisfying the terms agreed to in the signature, without comment on going over. We will explore this idea more later in the chapter when we discuss information hiding.

Using structures should be something that you are already familiar with - you do it every time that you invoke Int.compare. To use the fields of a structure, you access them using the name of the module, followed by a dot, followed by the name of whatever you are trying to access. Thus, calling Int.compare means to access the function named compare implemented within the structure named Int, which is provided as part of the Standard ML Basis Library. To use the structure that we have just implemented, we would similarly call ModArith.mod_add, for instance.

It is important to know that although a structure may contain values, a structure is not a value. Structures and signatures exist on another "level" of syntax outside that of expressions and values, and should not be mixed interchangeably with them. For instance, it would be nonsense to write val x = ModArith, as you cannot bind ModArith to a value identifier.

It is often useful when thinking of structures and signatures to think of them as "elevated types and values", as there is a neat correspondence between the two. A structure is really just a package of values (among other things, as we will see later) that must "type-check" in that it must successfully ascribe to its signature. As such, a structure can be regarded as a "bundle of values", and a signature as a "bundle of types". This only skims the surface at what can be done with modules, but it is a handy perspective to have.

Modules: A Study in Modularity

Ultimately, the main goal that we seek to achieve with modules is, in fact, modularity. As alluded to previously in this chapter, we have mentioned how we would like to be able to swap out certain implementations for others as necessary, to enforce clean boundaries of code dependency and produce more maintainable code. As it stands, we have only seen modules used for conveniently wrapping our code, however in this section we are going to discuss how modules allow us to maintain multiple implementations of the same specification.

Consider the following signature for performing geometric operations on the Cartesian plane:

signature GEOMETRY =
sig
    type point

    val origin : point
    val rotate : point -> real -> point

    val get_x : point -> real
    val get_y : point -> real
    val get_dist : point -> real
end

Suppose that we are interested in performing operations on points within the Cartesian plane. Then, we may be interested in a structure ascribing to this signature. The question then becomes - how should this signature be implemented? We have previously discussed swapping out more code for more efficient implementations, however in that case we clearly consider one unilaterally superior to the other.

In this signature, note that we have left the type of point abstract, meaning that it is up to the structure to define what precisely the type of point is - it is not decided by the signature. In this case, there are at least two discrete ways that we can consider points on the Cartesian plane - as rectangular or polar coordinates.

Consider the following structures ascribing to the GEOMETRY signature:

structure RectGeo : GEOMETRY =
struct
    (* pair of (x, y) coordinates *)
    type point = real * real

    val origin = (0.0, 0.0)
    fun rotate (x, y) r = (x * Math.cos r) + (y * Math.sin r)

    fun get_x (x, y) = x
    fun get_y (x, y) = y
    fun get_dist (x, y) = Math.sqrt (Math.pow (x, 2.0) + Math.pow (y, 2.0))
end

```sml
structure PolarGeo : GEOMETRY =
struct
    (* pair of (d, theta), where d is distance from origin and theta is angle from 0 degrees *)
    type point = real * real

    (* theoretically here theta could be anything *)
    val origin  = (0.0, 0.0)
    fun rotate (d, theta) r = (d, theta + r)

    fun get_x (d, theta) = d * (Math.cos theta)
    fun get_y (d, theta) = d * (Math.sin theta)
    fun get_dist (d, theta) = d
end

It should be relatively clear that these are two perfectly valid ways of representing an (albeit limited) implementation of coordinate geometry. Both have their own strengths and weaknesses, such as rectangular coordinates having more convenient access to computing the x and y coordinates of a given point, and polar coordinates more easily rotating points about the origin and calculating the distance of a point from the origin. As such, it comes down to context to determine which should be used in any given circumstance.

In this case, modules allow a convenient way of swapping out code without having to bother with poring through to find dependencies. If we are using the PolarGeo structure, and instead find ourselves wishing that we were using rectangular coordinates instead - no problem! We can simply load the other structure instead. In this case, we have moved dependencies and code reuse to a static, compile-time check, much in the same way that we moved type-checking to a static, compile-time check. This is a recurring theme, that we can avoid deferring errors to runtime to instead try and catch them earlier, in order to write cleaner and less error-prone code.

Modules: Transparent and Opaque Ascription

At this point, we have discussed how to use modules for enforcing "contracts" between parties interested in some kind of software, as well as how it can assist in code reuse and statically enforcing code dependencies. Another use of modules in the SML module system is information hiding.

It is sometimes said that one of the greatest ideas in computer science is abstraction. Whether it is focusing on developing an algorithm to solve a problem, a piece of software for a given client, or trying to interface with some existing computer system, it is often the case that it is only prudent to focus on those details that are relevant to the problem. It takes time and mental energy to internalize every last detail of a codebase, and it is a waste of time to try and do so every time that one wants to make a patch. As such, we rely on abstraction in order to keep the amount of relevant information low.

Consider the computer. It is a complicated, convoluted work of machinery and circuitry - at its most fundamental level being comprised of logic gates and incredible networks of interacting parts. Although an action as simple as opening up a notepad seems very intuitive to the user, under the hood it is anything but. The key, however, is that the user of a computer need not understand the underlying hardware and circuitry - indeed, they need not even know what circuitry is! In the general case, the user of a computer does not really know how it functions.

Yet again, it is also unlikely that it is important for you to know. A user of a computer does not need to know precisely how it works to know that they can type a query into Google. The only details that are relevant to the user are the devices that allow interfacing with the inner hardware (the mouse and keyboard, among other things), and less so the precise configuration of the microchips inside of the machine.

When writing code, we would like to maintain the same practice. We would like to write clear, concise, and maintainable code (for the unfortunate souls who have to go back and read it afterwards), so it is important to lessen the burden on the code-reader who comes after (which might be you!). With modules, we will be able to create enforced abstraction boundaries that allow programmers to only consider the interface of a given module, much in the same way that a computer-user needs only consider the physical interface of the device.

The first example of this phenomenon of information hiding is through the contents of a structure that are available for use. We have spoken of signatures as an interface, a sort of lens through which we view the contents of a structure. This metaphor is further strengthened by the fact that a signature literally does dictate what information the user of a structure is able to see. In any structure ascribing to a given signature, the only data able to be used by an outside client are those described within the signature.

Practically, this means that the use of "helper functions" in the implementation of a structure is hidden from the client. This means that in the following signature and structure:

signature REVERSE =
sig
    val reverse : 'a list -> 'a list
end

structure Reverse : REVERSE =
struct
    fun trev [] acc = acc
      | trev (x::xs) acc = trev xs (x::acc)

    fun reverse L = trev L []
end

only the value reverse is visible to the user of the Reverse structure. This can be convenient in enforcing abstraction, because we could be writing a program that uses helper functions that have certain preconditions. Through restricting the usage of those helper functions, the client is blocked from constructing certain inputs to those functions that may violate those preconditions, resulting in unallowed behavior.

We have used the term "ascription" several times so far in this chapter, referring to how a structure "implements" a signature, similarly to how a value has a certain type. In reality, there are two kinds of ascription: transparent and opaque. To demonstrate the difference, we will consider the following implementation of 2D arrays.

signature ARRAY =
sig
    type 'a array
    val new : int * int -> 'a -> 'a array
    val get : 'a array -> int * int -> 'a array
    val set : 'a array -> int * int -> 'a -> 'a array
end

structure RectArray : ARRAY =
struct
    type 'a array = 'a list list

    fun row 0 v = []
      | row n v = v :: row (n - 1) v

    fun new (0, width) v = []
      | new (height, width) v =
        row w v :: new (height - 1, width) v

    fun get A (y, x) =
        List.nth (List.nth (A, y), x)

    fun set A (y, x) v =
        let
          val row = List.nth (A, y)
        in
          List.update (A, y, List.update (row, x, v))
        end
end

This implementation of arrays uses a 2D list in order to create a rectangular array, with separate access to each row. There is one "creator" function, new, which constructs a new array, and get the "getter" and "setters" of get and set, which allow manipulation of an existing array. The structure RectArray thus ascribes to the ARRAY signature, and because of its usage of the colon symbol, it transparently ascribes to the ARRAY signature. We are now ready to discuss precisely what this means.

[Transparent ascription]: Ascribing a structure to a signature where all abstract types are visible to the user of the structure.

We have previously discussed the existence of abstract types, which are type declarations in the signature that is being ascribed to. For instance, the type of 'a array is an abstract type, because it is not concretely defined within the signature. However, if we had instead replaced that line in the ARRAY signature with type 'a array = 'a list list, it would no longer be abstract (and note that this would have the consequence that any structure ascribing to ARRAY must use the representation of a 2D list for its arrays).

What precisely does it mean for the type of 'a array to be visible to the user of the structure? We know that the user of a structure should only be aware of what is in the signature of a structure, and nothing more. This is not strictly true - if we have a transparently ascribed structure like RectArray, then the user of the RectArray structure can treat the 'a RectArray.array type as synonymous with 'a list list, and thus be privy to the "internals" of the structure, even though it is not explicitly defined within the structure.

We will now compare this to if we had instead used opaque ascription. Consider the following two implementations of the ARRAY signature:

structure ListArray :> ARRAY =
struct
    type 'a array = 'a list * int

    fun gen 0 v = []
      | gen n v = v :: gen (n - 1) v

    fun new (height, width) v =
        (gen (height * width) v, width)

    fun get (A, w) (y, x) = List.nth (A, w * y + x)

    fun set (A, w) (y, x) v = List.update (A, w * y + x, v)
end

structure RectArray :> ARRAY =
struct
    type 'a array = 'a list list

    fun row 0 v = []
      | row n v = v :: row (n - 1) v

    fun new (0, width) v = []
      | new (height, width) v =
        row w v :: new (height - 1, width) v

    fun get A (y, x) =
        List.nth (List.nth (A, y), x)

    fun set A (y, x) v =
        let
          val row = List.nth (A, y)
        in
          List.update (A, y, List.update (row, x, v))
        end
end

Note that the latter implementation is exactly the same as the previous RectArray structure, with the exception that is now opaquely ascribed. This is from using :> between the structure and signature names, which signals opaque ascription rather than transparent ascription.

[Opaque ascription]: Ascribing a structure to a signature where all abstract types are hidden to the user of the structure.

By "hidden", we mean precisely in the same way that the abstract types are "visible" in transparent ascription. Abstract types are truly that, abstract, and thus the user of either the RectArray or ListArray structure are incapable of knowing precisely what either the 'a ListArray.array or 'a RectArray.array types are. The only way to construct a value of type 'a array, from either structure, is by using either respective new function, and similarly with manipulating an existing array.

Why would we want to have opaque ascription? This is precisely for the reason that we have been motivating for the entire chapter, which is abstraction. A fundamental idea of having a powerful type system is to make illegal states unrepresentable. Instead of having a precondition that must be obeyed in a given function, we instead would like it so that the type system enforces these preconditions for us instead - similarly to how for a function of type int -> int with precondition "TRUE", if we were to instead write it in a dynamically typed ("typeless") language, we may instead say that it has a precondition that its input is an integer. This is then just an extension of that idea to more general representation invariants.

By representation invariant, we typically mean some invariant of a data structure that qualifies it to truly be an "instance" of that data structure. For instance, in the RectArray structure, we use a 2D list to represent a given array, however it is not the case that all values of type 'a list list correspond to an array of 'a values. For instance, is [[1, 2, 3], [4, 5]] a valid int RectArray.array?

If RectArray were to be transparently ascribed, a client could construct an illegal instance of an 'a RectArray.array, and thus produce undefined behavior when interfacing with it using functions like get and set. By opaquely ascribing it to its signature, however, we ensure that the only way that you can produce and manipulate arrays is internally, through the structure itself, so as long as the structure's functions preserve representation invariants, there will be no way of producing an illegal state.

Another motivation is that, ultimately, it isn't important for the client to know how arrays are implemented - this is an example of information hiding. If a client wants some implementation of the ARRAY signature, it likely isn't that important to them whether or not it is implemented as a one-dimensional or two-dimensional list, so long as it provides certain functionality (which, while limited in this example, could be extended). We can thus ensure that it is completely impossible to distinguish ListArray and RectArray, at least in terms of their input-output behavior. This benefits abstraction in that now the client does not have to think about the fiddley details of implementation, but instead focus purely on the interface.

We thus see that transparent ascription is useful for cases where knowing a representation is totally fine (or when debugging), but opaque ascription is in general the best approach when constructing a library. To clients of a given software package, it is best to enforce abstraction.

Conclusions

In this chapter, we have explored the powerful benefits that are given by SML's module system, particularly in terms of information hiding, abstraction, and modularization. While modules are in themselves of significant theoretical interest, there are many practical applications to their use, and they offer a clean and concise way to structure large codebases. In the next chapter, we will explore functors, which are like a higher-order structure that allows us a greater degree of freedom in structuring our code.

Modules

By Eunice Chen and Brandon Wu, December 2020

Sometimes, we are interested in developing large projects. Large codebases are oftentimes convoluted and impenetrable to outside scrutiny, being solely understandable by the original author. It is thus in our best interest to develop good coding practices - consisting of clean, commented code that communicates its cause clearly, as well as a modular structure that allows understanding what each part of the codebase should do. For instance, we would not want to simply dump all of our code into a single file with thousands of lines - instead, what we oftentimes may do is decompose it into constituent parts, each part of which has a certain purpose, such as parsing, maintaining certain data structures, or other purposes, depending on the use case.

In this chapter, we will discuss SML's module system, which allows for safe, powerful abstraction of code. We will discuss the power that data abstraction grants us as code designers, as well as the clean ways that we can compose them in order to encapsulate common design patterns.

Motivation

Oftentimes as programmers, we are tasked with writing code that matches a specification. This specification can vary in rigor and mathematical formality, but the general idea is that oftentimes it exists. When we are asked to write a function to compute the first \( n \) Fibonacci numbers, it does not necessarily matter how we go about implementing this function, so long as it exhibits the proper input-output behavior. In other words, we would like it so that all functions implementing a given specification have extensionally equivalent behavior to what we should expect them to do.

NOTE: Sometimes, there are constraints beyond simply being the same "mathematical function" - that is, defining the same outputs on the same inputs. Sometimes we are asked for the implementation of a function, running in only less than \( O(n^2) \) asymptotic time. Even in cases like these, however, there are always still superficial differences that our implementation allows for - for instance, when writing the Fibonacci function, it is unlikely that it should matter whether we calculate the value of \( f(n-1) \) or \( f(n-2) \) first.

This should not be an unfamiliar idea - this is exactly the concept of referential transparency, which says that we can swap "equals-for-equals" whenever discussion the equality of expressions. Under the eyes of the language, equal expressions are just that - equal, and there is no need to disambiguate them.

This idea, however, is rather limited in scope. What if we would like to deal with a function that depends upon another function? Then we cannot freely make the swap between functions, and substitute a function into a context where its dependencies do not exist. Indeed, we cannot perform such a substitution at all in the case where a function is in any way defined beyond its own function body itself.

We don't want to outlaw such practices, however - we simply need to make our abstraction boundaries more explicit. When we have a package or otherwise standalone bundle of software, we should be able to use all of its components that are muutally dependent on each other - not just make our distinctions at the function level. We will do so using SML's module system.

Modules: Basics

SML takes both the ideas of the specification and the implementation and provides a way of codifying both within the language itself. Of course, it is rather difficult to say "implement the list reversal function in \( O(n) \) asymptotic time in \( n \), the length of the list" in code[1], so SML will only deal with specifications at the type-level. That is, within SML itself, a specification for a function is simply a type for that function.

The term for a specification in SML is called a signature. Consider the following specification and implementation of a package for modular arithmetic. Note that the comments are optional.

signature MOD_ARITH =
sig
    (* ENSURES: mod_add n x y = (x + y) mod n *)
    val mod_add : int -> int -> int -> int

    (* ENSURES: mod_sub n x y = (x - y) mod n *)
    val mod_sub : int -> int -> int -> int

    (* ENSURES: mod_times n x y = (x * y) mod n *)
    val mod_times : int -> int -> int -> int
end

This signature defines three functions, all of which have type int -> int -> int -> int, which takes in the modulus of the operation, as well as the two operands to the operation to be performed. While we would like an implementation of this signature to also display the behavior outlined in the comments, realistically there is no way to verify that this is true (why?), so the best that we can do is ensure that an implementation provides functions with the correct type.

Note also that this syntax is slightly different than that you have seen before

  • we use val as if we were going to produce a val binding (with an associated type annotation), however we do not actually give it any definition. In truth, you cannot define a value in the signature itself, since it's impossible in general to check if two values are the same. You can, however, define a type within a signature (or leave it definitionless, as we will see later in the chapter).

An implementation of MOD_ARITH (called a structure, or module) might look as follows:

structure ModArith : MOD_ARITH =
struct
    fun mod_add n x y = (x + y) mod n

    fun mod_sub n x y = mod_add n x (~y)

    fun mod_times n x y =
        case Int.compare (x, 0) of
            LESS => mod_add n (mod_times n (x + 1) y) (~y)
          | EQUAL => 0
          | GREATER => mod_add n (mod_times n (x - 1) y) y
end

NOTE: By convention, structure names are usually capitalized, and signature names are usually all-capitals. Additionally, it is totally fine to declare a structure's values with fun instead of val, even if it says val in the signature, as fun is just shorthand for producing a function value binding anyways.

We will go over the precise meaning of the usage of the colon symbol in the above structure later in the chapter. For now, the meaning of this code snippet is to produce a structure ascribing to the MOD_ARITH signature, or in other words, implementing the MOD_ARITH specification. We see that it contains three declarations, those being the functions declared in the signature itself. A key note is that ascribing to a signature is akin to a contract - any structure ascribing a signature must implement all the requisite components described by the signature, or it will fail to ascribe, and result in a compile-time error (similar to an ill-typed program).

It is not, however, the case that a structure cannot provide more information than is strictly necessitated by the signature. Additional helper functions and value bindings can be freely instantiated within a structure without affecting ascription. Thus, honoring a contract only entails satisfying the terms agreed to in the signature, without comment on going over. We will explore this idea more later in the chapter when we discuss information hiding.

Using structures should be something that you are already familiar with - you do it every time that you invoke Int.compare. To use the fields of a structure, you access them using the name of the module, followed by a dot, followed by the name of whatever you are trying to access. Thus, calling Int.compare means to access the function named compare implemented within the structure named Int, which is provided as part of the Standard ML Basis Library. To use the structure that we have just implemented, we would similarly call ModArith.mod_add, for instance.

It is important to know that although a structure may contain values, a structure is not a value. Structures and signatures exist on another "level" of syntax outside that of expressions and values, and should not be mixed interchangeably with them. For instance, it would be nonsense to write val x = ModArith, as you cannot bind ModArith to a value identifier.

It is often useful when thinking of structures and signatures to think of them as "elevated types and values", as there is a neat correspondence between the two. A structure is really just a package of values (among other things, as we will see later) that must "type-check" in that it must successfully ascribe to its signature. As such, a structure can be regarded as a "bundle of values", and a signature as a "bundle of types". This only skims the surface at what can be done with modules, but it is a handy perspective to have.

Modules: A Study in Modularity

Ultimately, the main goal that we seek to achieve with modules is, in fact, modularity. As alluded to previously in this chapter, we have mentioned how we would like to be able to swap out certain implementations for others as necessary, to enforce clean boundaries of code dependency and produce more maintainable code. As it stands, we have only seen modules used for conveniently wrapping our code, however in this section we are going to discuss how modules allow us to maintain multiple implementations of the same specification.

Consider the following signature for performing geometric operations on the Cartesian plane:

signature GEOMETRY =
sig
    type point

    val origin : point
    val rotate : point -> real -> point

    val get_x : point -> real
    val get_y : point -> real
    val get_dist : point -> real
end

Suppose that we are interested in performing operations on points within the Cartesian plane. Then, we may be interested in a structure ascribing to this signature. The question then becomes - how should this signature be implemented? We have previously discussed swapping out more code for more efficient implementations, however in that case we clearly consider one unilaterally superior to the other.

In this signature, note that we have left the type of point abstract, meaning that it is up to the structure to define what precisely the type of point is - it is not decided by the signature. In this case, there are at least two discrete ways that we can consider points on the Cartesian plane - as rectangular or polar coordinates.

Consider the following structures ascribing to the GEOMETRY signature:

structure RectGeo : GEOMETRY =
struct
    (* pair of (x, y) coordinates *)
    type point = real * real

    val origin = (0.0, 0.0)
    fun rotate (x, y) r = (x * Math.cos r) + (y * Math.sin r)

    fun get_x (x, y) = x
    fun get_y (x, y) = y
    fun get_dist (x, y) = Math.sqrt (Math.pow (x, 2.0) + Math.pow (y, 2.0))
end

```sml
structure PolarGeo : GEOMETRY =
struct
    (* pair of (d, theta), where d is distance from origin and theta is angle from 0 degrees *)
    type point = real * real

    (* theoretically here theta could be anything *)
    val origin  = (0.0, 0.0)
    fun rotate (d, theta) r = (d, theta + r)

    fun get_x (d, theta) = d * (Math.cos theta)
    fun get_y (d, theta) = d * (Math.sin theta)
    fun get_dist (d, theta) = d
end

It should be relatively clear that these are two perfectly valid ways of representing an (albeit limited) implementation of coordinate geometry. Both have their own strengths and weaknesses, such as rectangular coordinates having more convenient access to computing the x and y coordinates of a given point, and polar coordinates more easily rotating points about the origin and calculating the distance of a point from the origin. As such, it comes down to context to determine which should be used in any given circumstance.

In this case, modules allow a convenient way of swapping out code without having to bother with poring through to find dependencies. If we are using the PolarGeo structure, and instead find ourselves wishing that we were using rectangular coordinates instead - no problem! We can simply load the other structure instead. In this case, we have moved dependencies and code reuse to a static, compile-time check, much in the same way that we moved type-checking to a static, compile-time check. This is a recurring theme, that we can avoid deferring errors to runtime to instead try and catch them earlier, in order to write cleaner and less error-prone code.

Modules: Transparent and Opaque Ascription

At this point, we have discussed how to use modules for enforcing "contracts" between parties interested in some kind of software, as well as how it can assist in code reuse and statically enforcing code dependencies. Another use of modules in the SML module system is information hiding.

It is sometimes said that one of the greatest ideas in computer science is abstraction. Whether it is focusing on developing an algorithm to solve a problem, a piece of software for a given client, or trying to interface with some existing computer system, it is often the case that it is only prudent to focus on those details that are relevant to the problem. It takes time and mental energy to internalize every last detail of a codebase, and it is a waste of time to try and do so every time that one wants to make a patch. As such, we rely on abstraction in order to keep the amount of relevant information low.

Consider the computer. It is a complicated, convoluted work of machinery and circuitry - at its most fundamental level being comprised of logic gates and incredible networks of interacting parts. Although an action as simple as opening up a notepad seems very intuitive to the user, under the hood it is anything but. The key, however, is that the user of a computer need not understand the underlying hardware and circuitry - indeed, they need not even know what circuitry is! In the general case, the user of a computer does not really know how it functions.

Yet again, it is also unlikely that it is important for you to know. A user of a computer does not need to know precisely how it works to know that they can type a query into Google. The only details that are relevant to the user are the devices that allow interfacing with the inner hardware (the mouse and keyboard, among other things), and less so the precise configuration of the microchips inside of the machine.

When writing code, we would like to maintain the same practice. We would like to write clear, concise, and maintainable code (for the unfortunate souls who have to go back and read it afterwards), so it is important to lessen the burden on the code-reader who comes after (which might be you!). With modules, we will be able to create enforced abstraction boundaries that allow programmers to only consider the interface of a given module, much in the same way that a computer-user needs only consider the physical interface of the device.

The first example of this phenomenon of information hiding is through the contents of a structure that are available for use. We have spoken of signatures as an interface, a sort of lens through which we view the contents of a structure. This metaphor is further strengthened by the fact that a signature literally does dictate what information the user of a structure is able to see. In any structure ascribing to a given signature, the only data able to be used by an outside client are those described within the signature.

Practically, this means that the use of "helper functions" in the implementation of a structure is hidden from the client. This means that in the following signature and structure:

signature REVERSE =
sig
    val reverse : 'a list -> 'a list
end

structure Reverse : REVERSE =
struct
    fun trev [] acc = acc
      | trev (x::xs) acc = trev xs (x::acc)

    fun reverse L = trev L []
end

only the value reverse is visible to the user of the Reverse structure. This can be convenient in enforcing abstraction, because we could be writing a program that uses helper functions that have certain preconditions. Through restricting the usage of those helper functions, the client is blocked from constructing certain inputs to those functions that may violate those preconditions, resulting in unallowed behavior.

We have used the term "ascription" several times so far in this chapter, referring to how a structure "implements" a signature, similarly to how a value has a certain type. In reality, there are two kinds of ascription: transparent and opaque. To demonstrate the difference, we will consider the following implementation of 2D arrays.

signature ARRAY =
sig
    type 'a array
    val new : int * int -> 'a -> 'a array
    val get : 'a array -> int * int -> 'a array
    val set : 'a array -> int * int -> 'a -> 'a array
end

structure RectArray : ARRAY =
struct
    type 'a array = 'a list list

    fun row 0 v = []
      | row n v = v :: row (n - 1) v

    fun new (0, width) v = []
      | new (height, width) v =
        row w v :: new (height - 1, width) v

    fun get A (y, x) =
        List.nth (List.nth (A, y), x)

    fun set A (y, x) v =
        let
          val row = List.nth (A, y)
        in
          List.update (A, y, List.update (row, x, v))
        end
end

This implementation of arrays uses a 2D list in order to create a rectangular array, with separate access to each row. There is one "creator" function, new, which constructs a new array, and get the "getter" and "setters" of get and set, which allow manipulation of an existing array. The structure RectArray thus ascribes to the ARRAY signature, and because of its usage of the colon symbol, it transparently ascribes to the ARRAY signature. We are now ready to discuss precisely what this means.

[Transparent ascription]: Ascribing a structure to a signature where all abstract types are visible to the user of the structure.

We have previously discussed the existence of abstract types, which are type declarations in the signature that is being ascribed to. For instance, the type of 'a array is an abstract type, because it is not concretely defined within the signature. However, if we had instead replaced that line in the ARRAY signature with type 'a array = 'a list list, it would no longer be abstract (and note that this would have the consequence that any structure ascribing to ARRAY must use the representation of a 2D list for its arrays).

What precisely does it mean for the type of 'a array to be visible to the user of the structure? We know that the user of a structure should only be aware of what is in the signature of a structure, and nothing more. This is not strictly true - if we have a transparently ascribed structure like RectArray, then the user of the RectArray structure can treat the 'a RectArray.array type as synonymous with 'a list list, and thus be privy to the "internals" of the structure, even though it is not explicitly defined within the structure.

We will now compare this to if we had instead used opaque ascription. Consider the following two implementations of the ARRAY signature:

structure ListArray :> ARRAY =
struct
    type 'a array = 'a list * int

    fun gen 0 v = []
      | gen n v = v :: gen (n - 1) v

    fun new (height, width) v =
        (gen (height * width) v, width)

    fun get (A, w) (y, x) = List.nth (A, w * y + x)

    fun set (A, w) (y, x) v = List.update (A, w * y + x, v)
end

structure RectArray :> ARRAY =
struct
    type 'a array = 'a list list

    fun row 0 v = []
      | row n v = v :: row (n - 1) v

    fun new (0, width) v = []
      | new (height, width) v =
        row w v :: new (height - 1, width) v

    fun get A (y, x) =
        List.nth (List.nth (A, y), x)

    fun set A (y, x) v =
        let
          val row = List.nth (A, y)
        in
          List.update (A, y, List.update (row, x, v))
        end
end

Note that the latter implementation is exactly the same as the previous RectArray structure, with the exception that is now opaquely ascribed. This is from using :> between the structure and signature names, which signals opaque ascription rather than transparent ascription.

[Opaque ascription]: Ascribing a structure to a signature where all abstract types are hidden to the user of the structure.

By "hidden", we mean precisely in the same way that the abstract types are "visible" in transparent ascription. Abstract types are truly that, abstract, and thus the user of either the RectArray or ListArray structure are incapable of knowing precisely what either the 'a ListArray.array or 'a RectArray.array types are. The only way to construct a value of type 'a array, from either structure, is by using either respective new function, and similarly with manipulating an existing array.

Why would we want to have opaque ascription? This is precisely for the reason that we have been motivating for the entire chapter, which is abstraction. A fundamental idea of having a powerful type system is to make illegal states unrepresentable. Instead of having a precondition that must be obeyed in a given function, we instead would like it so that the type system enforces these preconditions for us instead - similarly to how for a function of type int -> int with precondition "TRUE", if we were to instead write it in a dynamically typed ("typeless") language, we may instead say that it has a precondition that its input is an integer. This is then just an extension of that idea to more general representation invariants.

By representation invariant, we typically mean some invariant of a data structure that qualifies it to truly be an "instance" of that data structure. For instance, in the RectArray structure, we use a 2D list to represent a given array, however it is not the case that all values of type 'a list list correspond to an array of 'a values. For instance, is [[1, 2, 3], [4, 5]] a valid int RectArray.array?

If RectArray were to be transparently ascribed, a client could construct an illegal instance of an 'a RectArray.array, and thus produce undefined behavior when interfacing with it using functions like get and set. By opaquely ascribing it to its signature, however, we ensure that the only way that you can produce and manipulate arrays is internally, through the structure itself, so as long as the structure's functions preserve representation invariants, there will be no way of producing an illegal state.

Another motivation is that, ultimately, it isn't important for the client to know how arrays are implemented - this is an example of information hiding. If a client wants some implementation of the ARRAY signature, it likely isn't that important to them whether or not it is implemented as a one-dimensional or two-dimensional list, so long as it provides certain functionality (which, while limited in this example, could be extended). We can thus ensure that it is completely impossible to distinguish ListArray and RectArray, at least in terms of their input-output behavior. This benefits abstraction in that now the client does not have to think about the fiddley details of implementation, but instead focus purely on the interface.

We thus see that transparent ascription is useful for cases where knowing a representation is totally fine (or when debugging), but opaque ascription is in general the best approach when constructing a library. To clients of a given software package, it is best to enforce abstraction.

Conclusions

In this chapter, we have explored the powerful benefits that are given by SML's module system, particularly in terms of information hiding, abstraction, and modularization. While modules are in themselves of significant theoretical interest, there are many practical applications to their use, and they offer a clean and concise way to structure large codebases. In the next chapter, we will explore functors, which are like a higher-order structure that allows us a greater degree of freedom in structuring our code.

Functors

By Cooper Pierce and Brandon Wu, February 2021

We have so far discussed the usage of modules for explicitly structuring our code, in that we are afforded a degree of modularity in how the different components of some software can fit together. We have used language like "swapping out" modules or "substituting" modules for one another, but this is very imprecise. What exactly do we mean when we say to swap one module for another? Certainly, it would be messy to have to go through our code and change every mention of StructureA for StructureB, and we would like to avoid some kind of global change that requires an extenuous amount of effort on the client's part.

Going through and changing every mention of a particular structure in order to achieve "modularization" is akin to saying that using a particular higher-order function by replacing specifically what each function parameter is in each invocation is how we can achieve "parameterization" in the function argument. In reality, this does not offer any more versatility. In this chapter, we will discuss functors, which are akin to higher-order modules, which are allowed to take other modules in as arguments.

Functors: The Basics

Firstly, consider the signature of a stack.

signature STACK =
sig
    type 'a t

    val push : 'a t -> 'a -> 'a t
    val pop : 'a t -> 'a option * 'a t
    val size : 'a t -> int
    val empty : 'a t
end

Suppose that we would like to extend the definition of a stack such that it has a limit on how many elements that it contains. However, we don't necessarily know which implementation of a stack to use - there could be several such existing implementations, so we would like to instead make our BoundedStack structure a parameter of an existing structure ascribing to STACK.

We will then use a very similar signature to STACK called BOUNDED_STACK, which is the exact same except for having an exception Full for use if a stack is given too many elements.

signature BOUNDED_STACK =
sig
    type 'a t
    exception Full

    val push : 'a t -> 'a -> 'a t
    val pop : 'a t -> 'a option * 'a t
    val size : 'a t -> int
    val empty : 'a t
end

We first consider the case where we have a hard limit of 10 items in a given stack.

functor BoundedStack (S : STACK) :> BOUNDED_STACK =
struct
    type 'a t = 'a S.t

    val limit = 10
    exception Full

    fun push S x =
        if S.size S >= limit
          then raise Full
          else S.push S x

    fun pop S = S.pop S

    fun size S = S.size S

    fun empty S = S.empty
end

We see that most of this code is duplicated - we have based the majority of the design of this BoundedStack in terms of S, which is the given implementation of a stack. The correctness of a BoundedStack is thus totally dependent on whether or not the given S is correct, but we achieve modularity in that we can freely swap out a structure ascribing to STACK for another when instantiating a given BoundedStack.

To put it concretely, suppose that we have two structures:

structure Stack1 :> STACK =
struct
    (* some code here *)
end

structure Stack2 :> STACK =
struct
    (* some code here *)
end

Then we can create instances of the BoundedStack functor as follows:

structure BoundedStack1 = BoundedStack(Stack1)
structure BoundedStack2 = BoundedStack(Stack2)

NOTE: SML functors are generative, meaning that applying the same functor to the same structure twice yields two unique structures. As such, 'a BoundedStack1.t and 'a BoundedStack2.t are recognized as being two distinct types, despite the fact that they are "constructed" in the same manner.

BoundedStack1 and BoundedStack2 implement BOUNDED_STACK, so they all can access the fields of the BOUNDED_STACK signature. Presumably, the only change they display from Stack1 and Stack2 are in raising the exception Full when the stack is given more than ten elements.

Functors: Syntactic Sugar

SML offers some "syntactic sugar" for functor arguments, allowing us to (seemingly) parameterize them by terms other than structures. It is a little unsavory to have to hard code the limit of the BoundedStack within the functor itself, rather than having it be parameterized by the limit itself, so we can actually also write the following:

functor BoundedStack (structure S : STACK
                      val limit : int) :> BOUNDED_STACK =
struct
    type 'a t = 'a S.t

    exception Full

    fun push S x =
        if S.size S >= limit
          then raise Full
          else S.push S x

    fun pop S = S.pop S

    fun size S = S.size S

    fun empty S = S.empty
end

The only difference is that instead of taking in a single S : STACK, we specify "structure S : STACK val limit : int" within the parentheses of the functor's input. Note that there are no commas or delimiters other than spaces.

In reality, this is something of a lie. While this seems to give the impression that BoundedStack is taking in two things, a structure named S ascribing to STACK and a value of type int named limit, in reality functors can only take in other structures. This is thus syntactic sugar for the following code:

functor BoundedStack (UnnamedStructure :
                      sig
                        structure S : STACK
                        val limit : int
                      end) :> BOUNDED_STACK =
struct
    open UnnamedStructure
    (* same code as before *)
end

The open keyword specifies to open the namespace within a given module, effectively promoting all contents of it to the top level. Thus, if we were to open Stack1, as per our previous example, we could write push instead of Stack1.push, pop instead of Stack1.pop, and so on and so forth. Thus, what this syntactic sugar does is specify to take in a single structure ascribing to a signature that contains a structure ascribing to STACK and an int-typed value, named S and limit respectively.

CAUTION: This is a very important point to cognize! This can be the source of many frustrated hours of debugging due to a simple syntax error.

The reason why we must open UnnamedStructure in order to be able to use the same code is because we cannot say S.push, for instance, as we did in the original implementation of BoundedStack. We would instead have to specify UnnamedStructure.S.push, which is not what our previous code says. However, if we open UnnamedStructure first, the S structure is promoted to the top level, and we can now access it without first having to go through UnnamedStructure.

The reason for naming the input structure UnnamedStructure should hopefully now be clear. Indeed, it is a phantom structure of a sort, since in the syntactic sugar case, we never give it a name, and indeed we never really acknowledge its existence at all. Yet it is important to realize what is really happening, that there really is a structure being taken in as input, and then immediately opened for its contents.

What issues can occur if we forget about the existence of this syntactic sugar? Consider the following code:

functor BoundedStack (structure S: : STACK) :> BOUNDED_STACK =
struct
    type 'a t = 'a S.t

    val limit = 10
    exception Full

    fun push S x =
        if S.size S >= limit
          then raise Full
          else S.push S x

    fun pop S = S.pop S

    fun size S = S.size S

    fun empty S = S.empty

end

structure BoundedStack1 = BoundedStack(Stack1)
structure BoundedStack2 = BoundedStack(Stack2)

This code will not compile! Can you see why?

The issue is that we have added a prefix of structure to S : STACK within the inputs. Even though we have only specified one field, not including the limit as a parameter, this will be interpreted as the following:

functor BoundedStack (UnnamedStructure :
                      sig
                        structure S: : STACK
                      end) :> BOUNDED_STACK =
struct
    type 'a t = 'a S.t

    val limit = 10
    exception Full

    fun push S x =
        if S.size S >= limit
          then raise Full
          else S.push S x

    fun pop S = S.pop S

    fun size S = S.size S

    fun empty S = S.empty

end

structure BoundedStack1 = BoundedStack(Stack1)
structure BoundedStack2 = BoundedStack(Stack2)

Thus, BoundedStack is no longer a functor taking in a structure ascribing to STACK, but a functor taking in a structure ascribing to sig structure S: STACK end. In other words, taking in a structure containing a structure ascrbing to STACK! Thus, the line structure BoundedStack1 = BoundedStack(Stack1) will not type-check, as Stack1 does not ascribe to the same signature that BoundedStack is expecting. This one simple syntax error can be the source of much pain and frustration, so we caution the reader to be particular with their syntax, and mindful of what is really happening under the hood.

Case study: Typeclasses

Certain types have some functionality or operation in common. Depending on the operation in question, we can say that these types fall into the same typeclass, which is a common interface consisting of a type and the desired operations. Note that typeclass membership is not a formally defined relationship, but instead a useful categorization that we use in order to classify types that we intend to parameterize some implementation over.

For a concrete example of a typeclass, consider the ORDERED typeclass.

signature ORDERED =
sig
    type t
    val compare : t * t -> order
end

The ORDERED typeclass consists of those types that admit a sensible ordering, which we will not (and perhaps cannot) define. Thus, we can witness int and string as valid instances of the ORDERED typeclass with the following structures:

structure IntOrder : ORDERED =
struct
    type t = int
    val compare = Int.compare
end

structure StringOrder : ORDERED =
struct
    type t = string
    val compare = String.compare
end

Note that it is useful, in this case, for our instances of the ORDERED typeclass to be transparently ascribed, since it is the whole point that we are aware of the type that the typeclass is associated with.

NOTE: In actuality, we use transparent ascription as somewhat of a sledgehammer to avoid having to talk about a different language construct, namely where clauses. A where clause modifies a signature containing an abstract type, and concretely specifies what that type should be. For instance, we could discuss the signature signature ORDERED type t end where type t = int. A structure ascribing to this signature would "publish" the details of the type t (which is really an int), the same way that transparent ascription would. With where clauses, however, if there are multiple abstract types, we can be selective about which ones that are made "transparent". For the purposes of this chapter, however, we will largely avoid where clauses.

These definitions come very naturally from the fact that the String and Int libraries included with the Standard ML basis library already implement compare fields, however we can also define types such as int list to be instances of ORDERED:

structure IntListOrder : ORDERED =
struct
    type t = int list
    fun compare [] [] = EQUAL
      | compare [] ys = LESS
      | compare xs [] = GREATER
      | compare (x::xs) (y::ys) =
        case Int.compare (x, y) of
            EQUAL => compare xs ys
          | LESS => LESS
          | GREATER => GREATER
end

This structure defines a lexicographic ordering on int lists, using the fact that values of type int are already ordered. It prioritizes the relative comparison of the corresponding elements of both lists first, and then the length (akin to how a dictionary is ordered).

Indeed, we can take this one step further and see that lexicographic orderings form a functor, in that we can parameterize the ordering of some type of list, given that we can order the elements of the list. Like a higher-order function, this saves us from having to repeat the same code over and over to declare StringListOrder and CharListOrder structures, instead encapsulating the common pattern. We can implement the LexicListOrder functor as follows:

functor LexicListOrder (O : ORDERED) : ORDERED =
struct
    type t = O.t list
    fun compare [] [] = EQUAL
      | compare [] ys = LESS
      | compare xs [] = GREATER
      | compare (x::xs) (y::ys) =
        case O.compare (x, y) of
            EQUAL => compare xs ys
          | LESS => LESS
          | GREATER => GREATER
end

Note that since an instantiation of the LexicListOrder functor is itself a structure ascribing to ORDERED, it can be "passed in" as input to itself, resulting in any type of nested list being an instance of ORDERED, so long as the base type is also an instance of ORDERED.

It is also useful to note that equality types in Standard ML are essentially a language-supported typeclass, akin to inbuilt support for the following signature:

signature EQ =
sig
    type t
    val equal : t * t -> bool
end

The operations for equal for each "instance" of the typeclass are instead defined by Standard ML itself, and not user-defined. Thus, we can think of the equality operator = as simply invoking the T.equal method for the proper typeclass T, defined by the type that is being compared for equality.

In this next section, we will explore a concrete use for typeclasses when designing functors.

Case study: Red-black trees

Typeclasses can be important when we are attempting to place some greater constraint on the types that may instantiate some universal type. In certain cases, we do not want the types that we are considering to truly be any type, but any of a limited subset of types that share some common characteristic or implement some operation. We will study the use of red-black trees as the underlying data structure for dictionaries.

A dictionary is a simple data structure that maps keys to values. Consider its signature given below.

signature DICT =
sig
    type key
    type 'a dict
    val empty : 'a dict
    val insert : 'a dict -> key * 'a -> 'a dict
    val lookup : 'a dict -> key -> 'a option
end

It is a well-known fact that, utilizing a kind of balanced binary tree data structure, dictionaries can be implemented with an \( O(\log n) \) insert and lookup operation, as opposed to \( O(n) \) for other data structures such as lists. While there are many different implementations of balanced binary trees, we will consider a particular variant known as red-black trees.

[Red-black tree]: A variant of self-balancing binary tree that ensures logarithmic search and insert time. It is named because of its nodes, which are marked as either red or black. Furthermore, it obeys the following properties:

  1. All leaves are black.
  2. The children of a red node must be black.
  3. Any path from a given node to a leaf node must go through the same number of black nodes.

Note that as a variant of binary search tree, a red-black tree must also satisfy the invariant that the key stored at a node must be greater than or equal to every key in the left subtree, and less than or equal to every key in the right subtree.

It is easy to reason about why this schema ensures that we have the proper asymptotic bound for search - the third property in particular ensures that, for any path from the root, the length of the longest path from the root to a leaf is at most twice that of the shortest path. This is because the longest such path you can construct from the root to a leaf (minimizing black nodes) is by alternating black and red nodes.

This means that a given red-black tree is not as strictly balanced as some other variants (for instance, AVL trees), however it is always approximately balanced.

We would like to create a structure for red-black tree dictionaries. There are some options that we have - we could simply hard-code a TypeRedBlackDict :> DICT for any type Type, except that this would entail quite a bit of repeated code (and exertion on our part). Another solution would be to make the type of 'a dict doubly-polymorphic instead - something like an ('a, 'b) dict, where 'a is the type of the dict's keys and 'b the type of its contents. However, then we lose the guarantee that 'a is a type that supports comparison, which means that we cannot satisfy the tree's invariants.

The solution we will turn to is exactly similar to that as discussed in the previous section - we will instead design a RedBlackDict functor that takes in a typeclass implementing ORDERED, and exports a structure whose keys are the type of the given typeclass. We thus will define our functor with the following preliminaries:

functor RedBlackDict (Key : ORDERED) :> DICT =
struct
    type key = Key.t
    datatype color = Red | Black
    datatype 'a dict = Leaf | Node of 'a dict * (color * (key * 'a)) * 'a dict

    val empty = Leaf
    (* ... *)
end

Because we take as input a Key structure ascribing to ORDERED, we have access to the Key.compare function, which we will use when inserting into our dictionary. We define a color type (which only consists of the constant constructors Red and Black) for tagging the nodes of the red-black tree (leaves are considered to be black).

The question becomes: how should we implement insert? We cannot be so naive as to simply insert as we would in an ordinary binary search tree, as this would quickly cause problems with our invariants. In particular, we must be mindful of the black height invariant, saying that all paths to leaves must have the same number of black nodes on them.

The easiest case to tackle for insert is the Leaf case. How should we finish the definition of fun insert Leaf (k, v) = ? Well, clearly we must insert a Node(Leaf, (c, (k, v)), Leaf) for some color c. Note that since a Leaf is considered to be colored black, if we choose c to be Black, we will run into issues with our black height invariant - we have replaced a Leaf (a subtree of black height 1) with a subtree of black height 2! This will disproportionately affect the black height of paths ending in this subtree, thus causing the invariant to be violated.

Thus, the only sensible choice we can commit is to insert as a Red node. The astute reader may see that this will quickly cause issues - we will address this shortly. Thus, we can write

fun insert Leaf (k, v) = Node(Leaf, (Red, (k, v)), Leaf)
  | insert (Node (L, (c', (k', v')), R)) (k, v) = ...

How should we handle the Node case? Well, insertion really only happens at the leaves - the only thing that we can do at a Node is to propagate the change throughout the tree until it gets to where it needs to be. We have seen that this schema of an "always-red" insertion maintains the black-height invariant, however there is the red-children invariant as well - the children of a red node must themselves be red. This invariant is the one that we are not respecting, with our current schema.

So we only run into an issue when we insert into the tree such that the new node is the child of a red node. Furthermore, we know that, if the tree that we are inserting into is truly a red-black tree, it must respect the red-children invariant, and thus the the parent of the inserted node must itself have a black parent. Thus, there can only be four cases for the "site" of the insertion:

Cases
Fig 1. Illustration of the four cases of the red-children invariant being broken following insertion. The inserted nodes are marked with a "plus".

Such an invariant violation is only a local concern, however. All that is needed in order to restore the invariant is to simple rotate the site of violation, and do a simple recoloring. We will illustrate only the first case, and the rest follow similarly. You may verify for yourself that this continues to preserve the ordering and black-height invariants.

Balance
Fig 2. Illustration of the "balancing" necessary in order to preserve the red-children invariant in Case 1 of Fig. 1.

We thus write the following function which takes care of all four cases.

fun balance (Node(Node(Node(a,(Red,x),b), (Red,y), c), (Black, z), d)) =
            Node(Node(a,(Black,x),b),  (Red,y), Node(c,(Black,z),d))
  | balance (Node(a,(Black,x), Node(Node(b,(Red,y),c), (Red, z), d))) =
            Node(Node(a,(Black,x),b), (Red,y), Node(c,(Black,z), d))
  | balance (Node(Node(a, (Red,x), Node(b,(Red,y),c)), (Black,z), d)) =
            Node(Node(a,(Black,x),b), (Red, y), Node(c,(Black,z),d))
  | balance (Node(a, (Black,x), Node(b, (Red,y), Node(c,(Red,z), d)))) =
            Node(Node(a,(Black,x),b), (Red,y), Node(c,(Black,z), d))
  | balance T = T

Note that if we are not in any of the four described cases, balance simply acts as the identity function, as there is no invariant being broken.

However, this rotation may itself cause another site of red-children invariant violation, slightly farther up. As such, we must propagate this balancing operation as far up as necessary, in order to produce a proper binary tree at the very end. To this end, we can write the following code for the inductive case of insert:

fun insert Leaf (k, v) = Node(Leaf, (Red, (k, v)), Leaf)
  | insert (Node (L, (c', (k', v')), R)) (k, v) =
    case Key.compare (k, k') of
        LESS => balance(Node(insert L (k, v), (c', (k', v')), R))
      | EQUAL => Node(L, (c', (k, v)), R)
      | GREATER => balance(Node(L, (c', (k', v')), insert R (k ,v)))

This code ensures that, after descending into a subtree in order to insert the given key and value, a balancing operation is immediately performed once the insertion is complete. This ensures that we have a bottom-up propagation of balancings, immediately after completing the insertions. Note that because balance acts as the identity function on anything that does not pattern-match to either of the four cases, we perform only a negligible amount of extra checks at each recursive call, and ultimately are only concerned with those four cases.

However, this code is not complete. There is a minor edge case that remains - what if we are too close to the root for any of the four cases to apply? Our previous analysis relied on the fact that we could assume that the parent of our inserted node was red, and thus had a black parent - what of the case where the parent of the inserted node has no parent?

Consider the case illustrated below:

Example of inserting two nodes into an empty tree
Fig 3. Demonstration of potential issues in inserting nodes at the root when lacking a black parent.

As we can see here, our previous reasoning does not catch this red-children violation because it does not conform to our previous cases, by virtue of the inserted node not having a grandfather. This case can only happen at the root, however, since that is the only location where that can occur. As a result, the simple solution is to simply make the root of any red-black tree black - it will preserve the black height invariant, but also result in this red-red violation being impossible. We can amend our code as follows:

fun insert' Leaf (k, v) = Node(Leaf, (Red, (k, v)), Leaf)
  | insert' (Node (L, (c', (k', v')), R)) (k, v) =
    case Key.compare (k, k') of
        LESS => balance(Node(insert' L (k, v), (c', (k', v')), R))
      | EQUAL => Node(L, (c', (k, v)), R)
      | GREATER => balance(Node(L, (c', (k', v')), insert' R (k ,v)))

fun insert T (k, v) =
    case insert' T (k, v) of
        Leaf => Leaf
      | Node (L, (_, (k', v')), R) => Node(L, (Black, (k', v')), R)

Finally, this results in our completed code for the insert function. Note that because of the signature that we are ascribing to, helper functions such as balance and insert will not be visible to the client of the module, so there is no harm in declaring them within the namespace of the functor.

Our completed code for a red-black tree implementation of dictionaries is thus as follows. Note that the implementation of lookup is very straightforward, and

functor RedBlackDict (Key : ORDERED) :> DICT =
struct
    type key = Key.t
    datatype color = Red | Black
    datatype 'a dict = Leaf | Node of 'a dict * (color * (key * 'a)) * 'a dict

    val empty = Leaf

    fun balance (Node(Node(Node(a,(Red,x),b), (Red,y), c), (Black, z), d)) =
                Node(Node(a,(Black,x),b),  (Red,y), Node(c,(Black,z),d))
      | balance (Node(a,(Black,x), Node(Node(b,(Red,y),c), (Red, z), d))) =
                Node(Node(a,(Black,x),b), (Red,y), Node(c,(Black,z), d))
      | balance (Node(Node(a, (Red,x), Node(b,(Red,y),c)), (Black,z), d)) =
                Node(Node(a,(Black,x),b), (Red, y), Node(c,(Black,z),d))
      | balance (Node(a, (Black,x), Node(b, (Red,y), Node(c,(Red,z), d)))) =
                Node(Node(a,(Black,x),b), (Red,y), Node(c,(Black,z), d))
      | balance T = T

    fun insert' Leaf (k, v) = Node(Leaf, (Red, (k, v)), Leaf)
      | insert' (Node (L, (c', (k', v')), R)) (k, v) =
        case Key.compare (k, k') of
            LESS => balance(Node(insert' L (k, v), (c', (k', v')), R))
          | EQUAL => Node(L, (c', (k, v)), R)
          | GREATER => balance(Node(L, (c', (k', v')), insert' R (k ,v)))

    fun insert T (k, v) =
        case insert' T (k, v) of
            Leaf => Leaf
          | Node (L, (_, (k', v')), R) => Node(L, (Black, (k', v')), R)

    fun lookup Leaf k = NONE
      | lookup (Node (L, (_, (k', v)), R)) k =
        case Key.compare (k, k') of
            LESS => lookup L k
          | EQUAL => SOME v
          | GREATER => lookup R k
end

In the end, usage of modules allows us to write a powerful, parameterized implementation of a dictionary interface, in such a way that we ensure that our representation invariants are respected throughout each operation. By making a structure ascribing to the ORDERED typeclass a parameter of the functor RedBlackDict, we allow powerful generality in the type of the key to the dictionary, without having to introduce additional overhead in the functions of the module itself.

Conclusions

In this chapter, we have seen how functors are a potent tool when structuring our code, that allows us to enforce modularity and implement code reuse within the language itself. Functors also form the basis for a kind of higher-order module, where we can parameterize the structures we are capable of creating by other structures themselves, resulting in a greater degree of expression and versatility not unlike those of higher-order functions themselves.

Applications

Sequences

By Kaz Zhou, January 2021. Revised April 2021

In programming, we often need to write fast algorithms that use enumerable collections. Currently we use the list datatype to represent enumerable collections. There are some crucial limitations of lists though.

A great application of SML's modules system is the sequence signature. The implementation of sequences (which is specific to 15150) allows for parallelism and accessing any element in O(1) time; it also comes with some helpful functions.

Similar to our notation for lists, our notation for sequences looks like <x_1, x_2, x_3, ..., x_n>.

Sequences vs. Lists

Here are some advantages / disadvantages of sequences and lists. Considering sequences and lists purely from an algorithm-writing perspective, the only con of sequences is the slow cons operation.

SequencesLists
Accessing element iO(1)O(i)
Parallelism
Pattern matching
cons operationO(n) workO(1) work
Writing proofsDifficultRelatively easy

Important sequence functions

Some frequently used sequence functions are nth, tabulate, map, filter, and reduce. For a comprehensive documentation of the sequence library, see http://www.cs.cmu.edu/~15150/resources/libraries/sequence.pdf. The following section will define each of the functions, give examples of their usage, and analyze their work and span.

Seq.nth

The function Seq.nth allows for extracting any element from a sequence in O(1) work and span. This is a significant advantage over lists, which could take up to O(n) work and span, in the case of finding the last element of the list. Seq.nth has type 'a seq -> int -> 'a seq.

An example: Let's say the sequence S is bound to the value <0, 1, 2>, which has type int seq. Then,

  • Seq.nth S 0 evaluates to 0
  • Seq.nth S 2 evaluates to 2
  • Seq.nth S 5 raises an exception Range, which occurs when the index is too large or small.

Seq.tabulate

This is a very common tool for constructing a sequence. The type of Seq.tabulate is (int -> 'a) -> int -> 'a seq. The first argument is a function which maps an index to a value, and the second argument is an integer specifying the length of the sequence. In general:

Seq.tabulate f n ==> <f 0, f 1, f 2, ..., f (n-1)>

As a concrete example:

Seq.tabulate (fn i => i) 5 ==> <0, 1, 2, 3, 4>

Let's try writing a function that reverses a sequence. For example,

reverse <0, 1, 2, 3, 4> ==> <4, 3, 2, 1, 0>

This function reverse would have type 'a seq -> 'a seq. In the above example, we have a sequence of length 5. The element at index 0 moves to index 4, the element at index 1 moves to index 3, ..., the element at index 4 moves to index 0. Generally, if the sequence has length n, the element at index i should move to n - i - 1. It's pretty easy to make off-by-one errors here, so be careful and test sequence functions thoroughly! The implementation of reverse is as follows:

fun reverse S =
  let
    val n = Seq.length S
  in
    Seq.tabulate (fn i => Seq.nth S (n-i-1)) n
  end

It's also possible to write functions that deal with nested sequences. Let's write a function multtable : int -> int seq seq that, when given a positive integer n, makes an n by n multiplication table. For example,

multtable 5 ==>
<<0,0,0,0,0>,
 <0,1,2,3,4>,
 <0,2,4,6,8>,
 <0,3,6,9,12>,
 <0,4,8,12,16>>

This time, the function that we're tabulating with needs to output a sequence, like <0,2,4,6,8>.

Here is the implementation:

fun multtable n =
  Seq.tabulate (fn i =>
    Seq.tabulate (fn j => i*j) n) n

Let's analyze the work of multtable. The function (fn i => Seq.tabulate (fn j => i*j) n) has O(n) work. This is since the function evaluates i*j (for different values of j) n times. The function is called n times (from the outer tabulate). Therefore, the total work of this function is O(n^2).

For the span analysis, note that the function (fn i => Seq.tabulate (fn j => i*j) n) has O(1) span, because i*j is evaluated in parallel for all the different values of j. Then, the entire function has O(1) span because the function (fn i => Seq.tabulate (fn j => i*j) n) is called all in parallel.

In general, the cost graph for Seq.tabulate looks like this.

The cost graph for Seq.tabulate. Each branch at the top represents a different independent task for the processor(s) to perform.

The work of Seq.tabulate f S is the sum of all costs in the graph above. The span is the maximum cost of any one of the branches, because the function is called on 0, 1, ..., n-1 in parallel.

Seq.map

Seq.map is similar to List.map. The type of Seq.map is ('a -> 'b) -> 'a seq -> 'b seq. Given a function f : 'a -> 'b and a sequence <x_1, x_2, x_3, ..., x_n> : 'a seq, we have:

Seq.map f <x_1, x_2, x_3, ..., x_n> = <f x_1, f x_2, f x_3, ..., f x_n>

The work of evaluating the above expression is (work of doing f x_1) + (work of doing f x_2) + ... + (work of doing f x_n). The calls f x_1, f x_2, f x_3, ..., f x_n are all done in parallel. So, the span is the max of (span of doing f x_1, span of doing f x_2, ..., span of doing f x_n).

Seq.reduce

Seq.reduce is like List.foldr, but with sequences. The type is Seq.reduce : ('a * 'a -> 'a) -> 'a -> 'a seq -> 'a. In English, Seq.reduce takes in a combining function, a base value, and a sequence to "reduce". For example, Seq.reduce g z <x_0, x_1, x_2, ..., x_n> is extensionally equivalent to g(x_0, g(x_1, g(x_2, ..., g(x_n, z)))... ). Notice that the combining function is 'a * 'a -> 'a instead of 'a * 'b -> 'b, as in List.foldr.

The neat part is, Seq.reduce also supports parallelism. In particular, when a combining function g has constant span, then Seq.reduce g z S has O(log |S|) span. We pay a slight cost: the function g must be associative, which means that g(g(a,b),c) = g(a,g(b,c)) for all a,b,c. Furthermore, z must be the identity for g, which means g(a,z) = g(z,a) = a for all a.

To analyze the work and span, let's consider how SML actually evaluates Seq.reduce g z S. It first calls the combining function g on each pair of elements: x_0 and x_1, x_2 and x_3, and so on. All of these calls can be made in parallel. Then, we combine the intermediate results together. At the very end, all of the elements of S will be combined together.

The cost graph for Seq.reduce. Each branch at the top represents a different independent task for the processor(s) to perform.

The work is the sum of doing all the work shown in the cost graph, while the span is the longest path through the cost graph. Therefore, for constant g, the work of Seq.reduce g z S is O(|S|), and the span is O(log |S|).

An example of using this function is finding the sum of a sequence. So, Seq.reduce (op +) 0 <1, 2, 3, 4> evaluates to 10.

Seq.filter

Again, a sequence function is analogous to a list function. Seq.filter is like List.filter. It takes in a predicate function of type 'a -> bool and a sequence of type 'a seq, and keeps elements x such that p x evaluates to true. However, Seq.filter is optimized for parallel performance. The work is O(|S|). The span is O(log |S|), which may seem surprising as the predicate function may be applied to all elements in parallel. However, it is difficult to find out the number of elements that satisfy the predicate, and thus it's not possible to find the length of the filtered sequence in O(1). The implementation details are tricky, but use a divide-and-conquer approach just like Seq.reduce.

Seq.append

Just like @, the function Seq.append : 'a seq * 'a seq -> 'a seq puts two sequences together. The work to create the appended sequence, such as in the call Seq.append (S1,S2), is O(|S1| + |S2|). To justify this, think about how Seq.append may be implemented using Seq.tabulate. The tabulating function maps indices between 0 and |S1|-1 to elements from S1, and maps indices between |S1| and |S1 + S2| - 1 to elements from S2. Additionally, the span of Seq.append is O(1) regardless of the input sequences. This makes intuitive sense because Seq.tabulate also has O(1) span, when given a constant tabulating function.

Examples of functions involving sequences

Exercise: Pascal's triangle

Our task is to write a function pascal : int -> int seq seq. Given a nonnegative integer n, pascal n evaluates to the first n+1 rows of Pascal's triangle. For example:

pascal 5 =
<<1>,
 <1,1>,
 <1,2,1>,
 <1,3,3,1>,
 <1,4,6,4,1>,
 <1,5,10,10,5,1>>

Let's try to do this task without using the formula for entries of Pascal's triangle. The 0th row of Pascal's triangle is our base case. Then, we can add elements from the (n-1)th row to calculate elements in the nth row (in parallel!). This is an incremental approach: we add one row of the answer at a time.

This problem has a bit of a dynamic programming flavor, in that we should remember the previous row to help calculate new rows. One issue is that using Seq.append or Seq.cons to add a row to a sequence is expensive. However, remember the pros and cons of sequences vs. lists! Adding on a row to a list requires just O(1) work. Therefore, our approach for this problem will use both sequences and lists. The solution is below. Note that it uses the function reverse from the Seq.tabulate section of this article.

fun pascalH 0 = [Seq.singleton 1]
  | pascalH n =
    let
      val (prev::rest) = pascalH (n-1)
      fun rowmaker 0 = 1
        | rowmaker i =
          if i = n then 1
          else Seq.nth prev i + Seq.nth prev (i-1)
      val new = Seq.tabulate rowmaker (n+1)
    in
      new::prev::rest
    end

fun pascal n = reverse (Seq.fromList (pascalH n))

Examples

SML Basics Examples

By Eunice Chen, May 2020

Types

For each of the following declarations, state the type and value of x.

val x = 1 > 5

Type: bool

Value: false

val x = 15 div 0

Type: int

Value: Does not reduce to a value because it raises an exception

val x = 15.0 div 0.0

Type: not well-typed (div is a function of type int * int -> int)

Value: No value, since it is not well-typed

val x = fn (n : int) => Int.toString n

Type: int -> string

Value: fn n => Int.toString n

val x = fn n => ("1" ^ "5") ^ (Int.toString n)

Type: int -> string

Value: fn (n : int) => ("1" ^ "5") ^ (Int.toString n)

Scope

Example 0

let
  val y : int = 2
in
  fn (x : int) => z*z
end

What is the value of the let-in-end expression?

Value: No value, because z is not in scope (will cause an error).

Example 1

val y = 0
val z = 
    (let
      val x : int = 1
      fun f (x : int) = x
      val y : int = x + 1
    in
      fn (a : string) => x*2
    end) "150"
val a = y

What is the value of y before the let-in-end expression?

y = 0

What is the value of y within the let-in-end expression?

y = 2

What is the value of a?

a = 0

What is the value of z?

z = 2

Example 2

val x : int = 1 
fun f (x : int) = x + 1
val y : int = 2
val z : int = f y
val a : int = x

What are the values of x, y, z, and a?

x = 1

y = 2

z = 3

a = 1

Example 3

val x = 1
val y = 2
fun f (x, y) = 
    case x of 
      0 => 0
    | y => y
val a = f (y, x)
val b = f (x, y)

What are the values of a and b?

a = 2

b = 1

Recursion & Induction

By Eunice Chen, June 2020

Types

For each of the following expressions, state its type and value.

fun f (x : int) : int = f x

Type: int -> int
Value: The value is the function itself

val x = []::(150::[])

Type: int list
Value: [150]

fun f L =
    case L of
      [] => [[]]
    | x::xs => [x] @ []

Type: not well-typed
Value: no value, because it is not well-typed

What is the type and value of x?

fun f (y : int) : int = f y
val x : int = f y

Type: int
Value: No value, because it loops forever

What is the type and value of y?

val x =
    let
      fun incr (n : int) = n + 1
    in
      incr
    end
val y = incr

Type: No type, since incr is not in scope
Value: No value, since incr is not in scope

Recursion

Induction

Given the following function:

(* length: int list -> int *)
fun length [] = 0
  | length x::xs = 1 + length(xs)

(* append: int list * int list -> int list *)
fun append ([], L) = L
  | append (x::xs, L) = x :: append(xs, L)

Prove that for all L1, L2 : int list, length(append(L1, L2)) = length(L1) + length(L2).

Proof: By structural induction on L1.

BC: L1 = []

length(append(L1, L2))
= length(append([], L2))
= length(L2) [Clause 1 of append]

length(L1) + length(L2)
= length([]) + length(L2)
= 0 + length(L2) [Clause 1 of length]
= length(L2) [math and totality of length]

IS: L1 = x::xs for some x : int, xs : int list

IH: length(append (xs, L2)) = length(xs) + length(L2)

WTS: length(append (x::xs, L2)) = length(x::xs) + length(L2)

length(append (L1, L2))
= length(append (x::xs, L2)
= length (x::append(xs, L2)) [Clause 2 of append]
= 1 + length(append(xs, L2)) [Clause 2 of length, totality of append]
= 1 + length(xs) + length(L2) [IH]

length(L1) + length(L2)
= length(x::xs) + length(L2)
= 1 + length(xs) + length(L2) [Clause 2 of length]

By structural induction, we have shown that for all L1, L2 : int list, length(append(L1, L2)) = length(L1) + length(L2).

About

SML Help is a joint venture between previous and current teaching assistants of the computer science course 15-150: Principles of Functional Programming, taught at Carnegie Mellon University. It is aimed at being a comprehensive and accessible resource to students of 150, as well as those just interested in learning functional programming concepts in general, as viewed through the lens of the Standard ML language.

Teaching assistants form the backbone of 15-150, and SML Help no less. Without them, this page would not have been possible. Special credit goes to Jacob Neumann and Cam Wong for their instrumental work in getting the website working.

Contributors to SML Help include:

Brandon Wu, Cooper Pierce, David Sun, Eunice Chen, Harrison Grodin, James Gallicchio, Kaz Zhou and Len Huang