By Brandon Wu, September 2020
Type safety is a very powerful concept, one that lets us pin down the space of allowable inputs to only a narrow space of values. In this way, we can ensure that the function is only allowed to be applied to those arguments that we are interested in, making any other computations illegal. Sometimes, however, we are interested in making a function more general, with arguments that can somehow range over types. We will see how SML achieves this with parametric polymorphism, which is a separate concept than other forms of polymorphism that you may have seen before.
Suppose that we wanted to write a length function for lists.
Well first, we might want to specify. What type of list? Let's say that we first have int lists.
Well, the implementation is easy:
fun lengthInt ( : int list) : int = 0 | lengthInt (x::xs : int list) : int = 1 + lengthInt xs
But suppose that we don't only want the length of int lists, but also the length of string and char lists. Well then, we'll need to define a few more functions:
fun lengthString ( : string list) : int = 0 | lengthString (x::xs : string list) : int = 1 + lengthString xs fun lengthChar ( : char list) : int = 0 | lengthChar (x::xs : char list) : int = 1 + lengthChar xs
This becomes unnecessarily tedious. As we can see, other than the name of the
function and the type annotations, these functions have exactly the same code.
Nothing about the functions themselves make use of the fact that the lists in
question contain ints, strings, or chars - they are merely manipulated as
arbitrary elements. We would like to be able to write a
length function that
works on a list of any type.
We've seen something similar to this - we know that lists of all kinds exist. No
matter whether it is an
int list, a
string list, or a
char list, we know
:: is always valid to use when consing an element onto a list of the
appropriate type. Indeed, we say that
:: has type
t * t list -> t list for
t. Cons is thus polymorphic, it can be used on many different
types (though never two at the same time, for instance
1::["hello"] is still
This has been something of an alternative definition for what we will be discussing in this chapter - parametric polymorphism. Seen in this way, all polymorphic functions are parameterized by a type variable that ranges over types. We will go more in depth into this idea.
[Type variable] A type variable is a type that is quantified over types - that is, it can specifically take on the form of many types. They are enumerated as
'c, and so on, and they are referred to by Greek letters. For instance,
'cis "gamma", and so on.
Type variables are themselves valid types. As such, we can rewrite
fun length ( : 'a list) : int = 0 | length (x::xs : 'a list) : int = 1 + length xs
'a is quantified over all types, so no matter what type the input list to
length is, it will work correctly, as SML will use type inference to
determine what the proper type of the
length function should be, in any given
context. Note that
'a is a variable, and as always, in proofs, we must
quantify our variables.
'a and other type variables thus implicitly correspond
to having a "for-all" quantifier in front of it - so we say that an expression of type
'a has type
t, for all types
t. We further note that this is really syntactic sugar for the notation:
fun 'a length ( : 'a list) : int = 0 | length (x::xs : 'a list) : int = 1 + length xs
This specifies that the length function itself is parameterized over
'a is a "parameter" of
length, it is not truly an argument - we do
not explicitly pass in the type to
length. In an abstract sense, however, we
do, as each "instantiation" of
length has been "supplied" an argument to
to produce a "copy" of the function. You can think of it as if
infinitely many different variations that can be selected, depending on what the
given type is inferred to be. Note that the "type" that is chosen does not
necessarily need to be a concrete type - it may itself compose of type
variables. For instance, consider the following code:
fun tupleLength  = 0 | tupleLength ((x, y)::xys) = 1 + length xys
Disregard the rather strange implementation, which is a rather arcane way of
length with unnecessary overhead. In this example, we can see that
xys contains tuples of two elements, the types of which are unknown
to us. Without knowing a concrete type, we can only conclude that the type of
('a * 'b) list -> int. This is because the elements of the tuple
do not necessarily have to be correlated with each other -
thus independent, though they could be instantiated to the same type. As such,
we have taken a type variable and replaced it with two more type variables,
which gives us a little more information but still quite a bit of leeway.
tupleLength is thus itself still polymorphic, with the same type of
('a * 'b) list -> int.
NOTE: We have now reached a point where we will begin to omit explicit type annotations, as they tend to unnecessarily constrain the types of functions. It is also a good exercise to be able to understand conceptually how type inference is carried out, which is covered more in the next section.
An important principle to note, however, is that an
'a type does not magically
"just work" with regards to type safety. For instance, let us consider the
following code fragment, which is ill-typed:
fun inc (x : 'a) : int = 1 + x
Note that, in this case, type annotations have actually worked against us! Had
we removed the type annotations, this code would compile. The reason why this
code is ill-typed is because, while we are allowed to use a type more
generally than it actually is, we cannot use a polymorphic type more
specifically. The reason for this is because if we think about it as if we
explicitly passed in some type variable, in the scope of the function
inc has fixed
'a to be some type. It then attempts to add 1 to
x, which is
a value of that fixed, arbitrary type. However, we cannot add 1 to a value of
any type - we can only safely add 1 to
x if we know that
x has type
'a were instantiated to be a
string, it would violate type safety to
allow this code to compile. As such, we cannot explicitly type annotate
be of type
'a, as it is used more specifically than that in the body of the
A similar idea of extending functions to working on many types is exhibited by
equality types, which are a behind-the-scenes process that you have already
been exposed to. Consider the type of the function
= - the equality operator.
Clearly, it cannot have type
'a * 'a -> bool, since some types don't make
sense to compare for equality. For instance, how would you compare whether or
not a real is equal to another? Machine representations are finite, so asking
the question is bound to introduce problems. Another difficulty is in function
types - determining if two arbitrary functions are extensionally equivalent is
an uncomputable problem (closely related to the Halting Problem). As such, we
= to work on a wide spectrum of types, but also not work on
another, also very wide spectrum of types.
To do this, SML has a concept of equality types. The type variable
''c, as normal) are not quantified over all types, but merely all
equality types. This includes
string, and any datatype built up
from non-equality types, among others. Thus, the
= operator actually has type
''a * ''a -> bool, so as to only work on compatible types.
NOTE: Note that the polymorphism demonstrated by
length is of a different
flavor than that of
*, or other overloaded functions. The overloading of
basic arithmetic functions to work on ints and reals is more in line with what
might be referred to as "ad hoc" polymorphism, which is merely the compatibility
of a single operator with several possibly heterogeneous implementations, with
the precise implementation being chosen by context (such as type, in this case).
It is thus important to note that while parametric polymorphism identifies a
single, general implementation with many types, ad hoc polymorphism identifies
several different implementations with different use cases. Ad hoc polymorphism
is noticeably more inelegant than parametric polymorphism, but it is useful in
the cases that you only have a small subset of types to extend an operator to.
During compilation, SML will often need to determine exactly what the type of
the expression that we are looking at. This is not so different of a problem
than type-checking, however. On a high level, SML simply assigns everything a
very general type, and then begins looking at clues from the context so as to
narrow down what the "most general type" is. By "most general type", we usually
mean the most general type an expression can have, meaning that we do not use
specific instances of polymorphic expressions, but just the polymorphic type
itself. It is valid to say that the
length function has type
int list -> int, but its most general type is
'a list -> int.
[Type Inference] A procedure employed by type systems to infer the type of expressions and functions, even when not explicitly given those types.
This is how we can, for instance, determine that the function
fn x => x + 1
must have type
int -> int.
x is not deliberately stated to have any type,
but we know that it wouldn't make sense for
x to be any other type than
- it would otherwise not type-check. Note that
+cannot have an input type of
real * int.
We might also think of it like this - consider the expression
fn x => x + 1.
Most generally, we know that its type should an instance of type
'a. Moreover, we see that it is a lambda expression, so it must now be an instance of
'a -> 'b. The input is
x, so we assign
x to be an instance of type
'a. We then attempt to apply the
+ operator to
x, so for this to typecheck,
x must have type
int. The outcome of an
int and an
+ should be an
int, and that's the entire function body, so the whole expression has type
int -> int.
Parametric polymorphism offers us a clean and elegant way to extend general implementations to work across a spectrum of types. In this way, we can still preserve type safety while allowing us to avoid writing out the same implementations for many different types. Type inference also conveniently lets us omit manually determining the type of our code, instead being able to determine it from context. Ultimately, parametric polymorphism is a simple idea that offers us a great deal of versatility.