I’m in one of those reflective periods where I’m catching up on books I should have read earlier; Thomas Pynchon, Neal Stephenson, and the like. In Software Engineering one of those books is Structure and Interpretation of Computer Programs (SICP). SICP is known for teaching computing from basic principles using the programming language Scheme, (a (type of (lisp))). I began programming using Javascript, PHP, and Java. The first Lisp I encountered was Clojure, and I found the language and paradigm to be radically different from anything I’d ever written. I wrote some hobby projects in Clojure, thought that I had reached that mythical enlightenment every programmer is supposed to have with Lisp, and then switched back to Python. In retrospect I see I had fooled myself into thinking I grokked Lisp. All I had done was port some Python programs into Clojure, preserving the Pythonic paradigm and even code layout. Now I’m coming back to Lisp because I think there’s more to learn.
I’m interested in the theory of computation – what can be solved by computers? How can we represent computations? and how should we think about it? Neal Stephenson’s Baroque Cycle is a historical fiction set in the decades surrounding the year 1680. We see members of the Royal Society working towards developing a Philosophical Language so that all concepts and ideas can be expressed concisely – finally man will no longer be misunderstood! We’re still waiting for that one. In this Philosophical Language there would be only one way to communicate an idea. Representing ideas in a single way is something we strive to do in Software Engineering. In large code bases we often see multiple implementations of the same abstraction, and use terms such as conciseness and correctness to evaluate the quality of the code we write. The Baroque Cycle also features Leibniz working on his theory of computation and continuing a Twitter-feud with Newton.
Structure and Interpretation of Computer Programs opens with:
We are about to study the idea of a computational process. Computational processes are abstract beings that inhabit computers. As they evolve, processes manipulate other abstract things called data. The evolution of a process is directed by a pattern of rules called a program. People create programs to direct processes.
Programmers are alchemists, lines of code are our metals, abstractions are our enchantments, and all we produce are dull and complex systems.
Basic Computation
To develop my understanding of Lisp and Computation I’m writing a programming language which can run the code samples in SICP. It’s intentionally not a Scheme interpreter, as it’s still the 1690s and the Scheme Standard hasn’t been written yet. Instead, I’ll use the concepts and code samples discussed in the book to determine how the language operates. In this way the language will evolve bottom up.
We start by writing a basic calculator, a computer program that can execute the predefined computations of addition and subtraction.
Pages 5 and 6 of SICP introduce the first code. The table below shows the values we want to be able to evaluate.
Input | Output |
---|---|
486 |
486 |
(+ 137 439) |
486 |
(+ (+ 3 5) (- 10 6 0)) |
12 |
The main syntax of Lisp are S-Expressions. These are parenthesised expressions such as (+ (+ 3 5) (- 10 6))
. All computations and data of the language are written in S-Expressions, giving the language a notion of simplicity. This isn’t like Rust which requires a cheatsheet of the language tattooed on your arm.
The entire language is represented by this Atom
data type.
type Atom =
| Integer of int
| Symbol of string
| List of Atom list
Parsing
To parse the language we’ll use the parser combinator library FParsec. We start by defining small parsers for our primitive expressions, then compose these into more complex parsers to build the full programming language.
Where Integer
represents the primitive numbers, Symbol
is either +
or -
, and List
is the S-Expression made up of many Atom
. Note List
is a recursive definition so that S-Expressions may be embedded within each other.
Parser combinators make defining the logic for parsers very simple to understand.
let parseSymbol =
(pstring "+" >>% Atom.Symbol "+") <|>
(pstring "-" >>% Atom.Symbol "-")
<?> "Could not parse symbol"
let parseInteger = pint32 |>> Atom.Integer
let parseList =
skipChar '(' >>. spaces >>.
(attempt (sepBy atomValue spaces1) <|> sepEndBy1 atomValue spaces1)
.>> spaces .>> skipChar ')'
|>> Atom.List
let atomValue = choice [
parseSymbol
parseInteger
parseList
]
Running the full language parsers over inputs we get the following values:
Input | Parser Output |
---|---|
486 |
Integer 486 |
(+ 137 439) |
List [Symbol "+"; Integer 137; Integer 439] |
(+ (+ 3 5) (- 10 6 0)) |
List [Symbol "+"; List [Symbol "+"; Integer 3; Integer 5]; List [Symbol "-"; Integer 10; Integer 6; Integer 0]] |
Evaluation
The language gets parsed into a nested tree. S-Expressions may contain other S-Expressions which may contain other S-Expressions &c .
List [Symbol "+";
List [Symbol "+"; Integer 3; Integer 5];
List [Symbol "-"; Integer 10; Integer 6]
]
For simplicity and aesthetics, we’ll write our evaluator as a recursive function. The evaluator is applied to Atom
s one at a time, and also returns an Atom
.
Of note below is the List
branch of Line 6. In this simple calculation language all S-Expressions are computations. When eval
encounters an S-Expression it expects the first item in the list to be the operator (Symbol "+"
or Symbol "-"
), and the remaining items to be the operands which are to be added or subtracted. These operands aren’t necessarily going to be Atom.Integer
, they may be an Atom.List
! In which case we need to evaluate that sub expression before continuing with the mathematical operation, hence the recursive eval
applied to every operand on Line 14.
|
|
Recall that this calculator language is constructed by the Atom
datatype.
type Atom =
| Integer of int
| Symbol of string
| List of Atom list
F# as the host language knows how to add and subtract numbers, but it doesn’t know how to add and subtract Atom
s. In the eval
snippet above it will call F#’s built in +
and -
functions. We must define the computation logic for these operations.
let inline (+) (x: Atom) (y:Atom) =
match (x, y) with
| (Integer x, Integer y) -> Atom.Integer (x + y)
| _ -> failwith "Add only supports Integers"
let inline (-) (x: Atom) (y:Atom) =
match (x, y) with
| (Integer x, Integer y) -> Atom.Integer (x - y)
| _ -> failwith "Subtract only supports Integers"
Assemble
We have a parser which turns a string of characters into an expression tree. We have an evaluator which applies computations to a valid expression tree. Let’s unite them into a computer program we can execute.
while true do
printf ">> "
let input = Console.ReadLine()
parser input
|> Result.map eval
|> Result.map (fun s -> printfn $"{s}")
|> Result.mapError failwith
|> ignore
Does it work? This wouldn’t be a particularly good blog if it didn’t.
logan@anon scripts % dotnet fsi 01-ints-and-symbols.fsx
>> 486
Integer 486
>> (+ 137 439)
Integer 576
>> (+ (+ 3 5) (- 10 6 0))
Integer 12
Full source code and an executable script is available here.
Close Paren
In those ~70 lines of source code we’ve defined a programming language that can execute arbitrary addition and subtraction routines. We can express simple math operations, but we can’t express our own abstractions. In Implementing General Computation we explore the idea of abstraction and implement functions.