In A Simple Computation Engine with F# we created a programming language which computes basic math expressions such as (+ (+ 3 5) (- 10 6 0))
. One limitation was that the only computations it could run were predefined by the programming language, specifically +
and -
. Here we evolve the language adding symbols and functions, supporting the use cases listed below.
Input | Output |
---|---|
(define x 100) |
Symbol x |
(+ 137 x) |
Integer 237 |
(define increment (lambda (x) (+ x 1))) |
Symbol increment |
(increment 100) |
Integer 101 |
This is Part II in a series of blogs on building a programming language. I’m reading Structure and Interpretation of Computer Programs and making the language based on code samples in the boo. I recommend first reading Part I.
The Language
The simplicity of S-Expressions means the entire programming language is represented in a single type definition. The Atom
is this base unit, and every value of the Atom must be evaluable (more on this below). In this blog we expand the language and add Lambda
(callable functions) and Environment
, a place to store and retrieve symbol definitions.
type Atom =
| Integer of int
| Symbol of string
| List of Atom list
| Lambda of Lambda
and
Lambda = {
Parameters: Atom list
Body: Atom
Environment: Environment
}
and
Environment = {
Symbols: Dictionary<Atom, Atom>
}
Parsing
Only a small addition needs to be made to the parser from the calculator. Previously the only symbols in the language were +
and -
, these were pre-defined by the language. Now we need to extend our definition of a symbol to include a name that the user defined.
let parseSymbol =
(pstring "+" >>% Atom.Symbol "+") <|>
(pstring "-" >>% Atom.Symbol "-") <|>
(regex "^([!#$%&*+./:<=>?@^_~a-zA-Z][!#$%&*+./:<=>?@^_~a-zA-Z0-9]*)" |>> Atom.Symbol)
<?> "Could not parse symbol"
The parser will return a Symbol
when it parsers a +
, -
, or whatever that regex I copied off the internet matches. Running the parser on the inputs gives the following results.
Input | Parser Output |
---|---|
(define x 100) |
List [Symbol "define"; Symbol "x"; Integer 100] |
(+ 137 x) |
List [Symbol "+"; Integer 137; Symbol "x"] |
(define increment (lambda (x) (+ x 1))) |
List [Symbol "define"; Symbol "increment"; List [Symbol "lambda"; List [Symbol "x"]; List [Symbol "+"; Symbol "x"; Integer 1]]] |
(increment 100) |
List [Symbol "increment"; Integer 100] |
Named Abstractions
Abstractions are built through computational objects – what we might refer to as variables and functions. When evaluating code, these computational objects don’t need human-recognisable names. Church’s Lambda Calculus of the 1930s showed that symbolic names aren’t a fundamental requirement of computation. We can express generic computations purely with lambdas (anonymous functions) – it’s lambdas all the way down. Likewise modern compilers intentionally strip symbolic names we assign to computational objects, replacing them short symbols which can better fit in caches.
Ninety years of thinking on computation has shown us that our programs not only need to be understood by computers, but also by humans. Giving computational objects symbolic names are a key part in us being able to organise and reason with our code. An anonymous function, or a function with a poorly chosen name (eg. do-the-thing
) gives us, the programmer, no insight into what the function does. Giving the same function a symbolic name like increment
reveals a lot more.
Environments
Environments are the alchemic ingredient which elevates the simple calculator into a general purpose programming language. Environments enable us to associate a symbolic name with a computational object. Put another way, environments give the programmer the ability to build abstractions.
We implement an environment as a simple dictionary of type:
type Environment = {
Symbols: Dictionary<Atom, Atom>
}
Now we update the evaluator so that Atoms are evaluated in the context of an environment. Quick reminder: eval
runs on the parse tree output and is applied recursively to every Atom. We make some changes below.
- Add an explicit
Environment
parameter toeval
. For now we are using a global environment, so the same environment will be shared in alleval
calls. - Support evaluating the special form
define
. This code path gets triggered for an expression like(define x 100)
. Note: we applyeval
to the value we’re assigning to the symbol - the value may be it’s own expression. eg.(define x (+ 1 2))
. - Support retrieval of symbols. This code path gets triggered for code
x
.
let rec eval(exp: Atom) (environment: Environment): Atom =
match exp with
| Integer x -> Atom.Integer x
| List x when x.Head = Atom.Symbol "define" ->
let symbol = x.Item(1)
let value = x.Item(2)
environment.Symbols[symbol] <- eval value environment
symbol
| Symbol s ->
match environment.Symbols.ContainsKey(Symbol s) with
| true -> environment.Symbols[Symbol s]
| false -> failwith $"Could not locate symbol {s}"
...
With this simple change we can now give symbolic names to values.
logan@anon scripts % dotnet fsi 02-global-environment.fsx
>> (define x 100)
Symbol x
>> x
Integer 100
>> (define y (+ x 200))
Symbol y
>> y
Integer 300
>> (define z y)
Symbol z
>> z
Integer 300
Defining Lambdas
Structure and Interpretation of Computer Programs includes a quote from Royal Society member John Locke’s An Essay Concerning Human Understanding (1690):
The acts of the mind, wherein it exerts power over simple ideas, are chiefly these three:
- Combining several simple ideas into one compound one, and thus all complex ideas are made.
- … Bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them all at once, without uniting them into one, by which it gets all its ideas of relations.
- The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.
In programming, we can think of functions as the atomic unit for building abstractions. Functions allow us to organise ideas or concepts into units where the implementation details are hidden away from the caller. Modern programming languages include other features which aid us in building abstractions such as classes, interfaces, namespaces, type systems, and generics. For now we just need functions. Here’s the type definition:
type Lambda = {
Parameters: Atom list
Body: Atom
Environment: Environment
}
The computation is stored in Body
, the values it operates reside in Paramaters
, and the function will be evaluated in the context of the Environment
. Note that there’s no field to store the function’s name. Functions are first class values, they may be assigned to a symbol just like a basic integer is. This is handled by the logic we wrote for define
.
(define increment (lambda (x) (+ x 1)))
We need to tell the evaluator what to do when it encounters a lambda
block. Our programming language expects the first parameter to a lambda to be a list containing the parameters to a function, so we type check that. Otherwise we’re just returning a datatype of the lambda.
Consider how this works alongside the define
from above. The symbol name increment
gets assigned this Lambda
value.
let rec eval(exp: Atom) (environment: Environment): Atom =
match exp with
...
| List [Symbol "lambda"; parameters; body] ->
let parsedParameters =
match parameters with
| List atoms -> atoms
| _ -> failwith $"Expected lambda parameters to be a list of symbols"
Atom.Lambda {
Parameters = parsedParameters
Body = body
Environment = environment
}
...
Calling Functions
The cauldron bubbles.
(define increment (lambda (x) (+ x 1)))
We can associate a lambda with a symbol name. How can we call it?
>> (increment 100)
Integer 101
Finally we need to program the evaluator to tell it how to call functions. This branch gets called if eval
encounters a list where the first symbol is not one of the special forms (define
, lambda
), nor the pre-defined functions (+
, -
). ie. it’s a symbol name that the host language has no idea about. We first check if it’s a symbol defined in the environment, and also that it’s type Lambda
– so that we know we can execute it.
Lambdas have parameters, defined as an Atom list
. When executing the function we need to associate the values defined in the function call (100
) with the function parameters in the definition (x
). For this we use the environment, and join the two based on relative positioning. The environment gets set (100
is assigned to x
), and the body of the function is eval
’d in the context of the environment.
let rec eval(exp: Atom) (environment: Environment): Atom =
match exp with
...
// call a function
| List x ->
let symbol = x.Head
let args = x.Tail
// find the lambda we previously defined
let callable =
match environment.Symbols.ContainsKey(symbol) with
| true -> environment.Symbols[symbol]
| false -> failwith $"Could not locate callable symbol {symbol}"
// evaluate the arguments being passed into the function call, these may themselves be lambda calls
let evaluatedArgs =
args
|> List.map (fun atom -> eval atom environment)
// to the global environment we assign the keys of the lambda's parameter symbols
// with the evaluatedArgs passed in the function call. we match these by order.
// then call the function itself
match callable with
| Lambda lambda ->
List.zip lambda.Parameters evaluatedArgs
|> List.iter (fun (parameter, value) ->
environment.Symbols[parameter] <- value)
eval lambda.Body environment
| _ -> failwith $"Symbol {symbol} is not callable"
The Gears Turn
In writing this code I had revelation. Perhaps this was the mythical Lisp enlightenment I’ve been seeking. Perhaps it’s my brain slowly going insane from looking at too many parentheses. Perhaps it’s from residual LSD left by the last reader on this second-hand copy of Structure and Interpretation of Computer Programs.
The idea is around abstraction and computation, and how Lisps represent both with S-Expressions. Atoms form the core of the programming language. These atoms serve two purposes:
- Represent data we want to compute – it could be a simple integer or complex data object.
- Represent the computations we want to perform on the data – abstractions and functions.
type Eval = Atom -> Environment -> Atom
type Atom =
| Integer of int
| Symbol of string
| List of Atom list
| Lambda of Lambda
...
The evaluator function is applied to an Atom
and an Environment
and returns another Atom
. If the evaluator is applied to an Integer
, it simply returns it. If the evaluator is applied to a function, it runs it. In this way we could view the evaluator’s purpose as reducing some complex structure of abstractions implemented in Atoms (lists and lambdas), to some basic set of Atoms (integers, symbols, and lists of data).
Now the line gets blurry.
>> (define increment (lambda (x) (+ x 1)))
Symbol "increment"
>> (define dataToCompute 100)
Symbol "dataToCompute"
We define a couple of things: a computation and some data we intend to run it on. The evaluator evaluates both of these, but what’s the distinction? Both computation and data are just data. It’s as if increment
is some sort of theoretical computation, or some data that we trick the evaluator into executing.
We can poke at increment
like it’s some symbol. It’s only when we try and call it that the data acts as a computation.
>> increment
Symbol "increment"
>> (increment dataToCompute)
Integer "101"
Here we’ve turned code that means things into code that does things.
Full source code and an executable script is available here.
In Part III Abstractions for Logic we add booleans and logical abstractions to the language.
–
This post is part of F# Advent Calendar 2022. Check it out.