Programming languages provide abstractions which allow the expression of logic. This gives us, the Alchemist programmer, the ability to recite a few incantations on our keyboards and imbue the programs we are writing with the apparent ability to make decisions. Without logic computer programs would be constructed to do a single thing, and recursion and loops would act as black holes.
In this blog we add a boolean data type and some common logical control operators.
Input | Output |
---|---|
(if (= 1 1) true false) |
Boolean true |
(if (= 1 2) true false) |
Boolean false |
(and true true true) |
Boolean true |
(and true true false) |
Boolean false |
(or true true false) |
Boolean true |
(cond (false 1) (false 2) (true 3)) |
Integer 3 |
This is Part III in a series of blogs on building a programming language. I’m reading Structure and Interpretation of Computer Programs and making the language based on code samples in the book. Read the first two blogs:
- A Simple Computation Engine in F# - S-Expression calculator
- Implementing General Computation - Variables & functions
Booleans or Control Logic
When we talk about a programming language adding support for logic, what do we actually need? Our first thought goes to adding a boolean data type that has two possible values: true
and false
. It turns out the data type doesn’t need two values, we could make do with one value: true
, where the equivalent of false is just defined as not(true).
Java and other object orientated languages know better and gave booleans three values! You can do this by using the Boolean
class instead of the bool
primitive type – you shouldn’t, but you can, just like you can use software licensed by Oracle. Encountering the use of Boolean
in a production codebase does wonders for putting life’s other problems in perspective.
Boolean b = new Boolean(true);
b = null;
System.out.println(b);
// null
It turns out that the data type does not matter at all, it’s the control operators that are needed. The special forms in a programming language such as if
, else
, and
, and not
. Those functions can operate on any data type, not just booleans. We just need to give some implicit logic in our programming language of what is truthy. For example the C programming language does not have a fundamental boolean data type, it uses integers. In some ways C one-ups Java here by having infinitely more possible values for a boolean.
typedef int bool;
#define true 1
#define false 0
// my suggestions
#define alsotruebutnottrue 2
#define halfwaybetweentrueandfalse 3
In languages such as Javascript which have a boolean data type, you don’t even need to pass a boolean to those control operators. Applying if
to an empty string, the integer 0, or null all evaluate to false. I don’t want to continue this tradition, so the control operators we implement in LParen will be strongly typed in that:
- There will be a boolean data type with two possible values:
true
andfalse
- Control operators (
if
,cond
,and
, …) strictly require booleans, no casting of types ala Javascript.
The Language
LParen uses F#’s type system to make a relatively-strongly-typed dynamic programming language. We’re adding Boolean
as a fundamental data type of Atom
. For ease of implementation it wraps the F# type bool
, but as talked about above it could just as easily be an integer or string.
type Atom =
| Integer of int
| Symbol of string
| Boolean of bool
| List of Atom list
| Lambda of Lambda
Atoms are the building blocks of the language, and all must evaluable. Booleans are primitive data types in that they evaluate to themselves. This makes adding it really easy.
let rec eval: Eval = fun (exp: Atom) (environment: Environment) ->
match exp with
| Boolean x -> Atom.Boolean x
| Integer x -> Atom.Integer x
| List [Symbol "define"; firstArg; secondArg] -> define firstArg secondArg environment eval
...
Likewise parsing booleans are trivial using the parser combinator approach. We define a parser for the boolean keywords:
let parseBoolean =
(pstring "true" >>% Atom.Boolean true) <|>
(pstring "false" >>% Atom.Boolean false)
Then update the expression parser, making sure to put parseBoolean
ahead of parseSymbol
for precedence.
do atomValueRef.Value <- choice [
parseBoolean
parseSymbol
parseInteger
parseList
]
Those minor evaluator the language definition, evaluator, and parser have added the boolean data type to the language. Having the data type alone isn’t enough. We need operators to use these.
Control Operators
When the evaluator encounters the symbol if
, and
, or
, and cond
it needs to know that these are special forms, and we’re not trying to call a user-defined function. Here we update the evaluator, then below look at each operation in depth.
let rec eval: Eval = fun (exp: Atom) (environment: Environment) ->
match exp with
| Boolean x -> Atom.Boolean x
| List [Symbol "if"; predicate; consequent; alternative] -> ifForm predicate consequent alternative environment eval
| List x when x.Head = Symbol "and" -> andFormShortCircuit x.Tail environment eval
| List x when x.Head = Symbol "or" -> orFormShortCircuit x.Tail environment eval
| List x when x.Head = Symbol "cond" -> condForm x.Tail environment eval
...
if
The if
special form expects 3 expressions:
(if <predicate> <consequent> <alternative>)
- predicate is an expression that must evaluate to a boolean
- consequent is an expression that gets executed if predicate is true.
- alternative is an expression that gets executed predicate is false (think of it like else).
The expression evaluator is updated to know how to encounter an if
:
And we define the logic to handle it:
let ifForm (predicate: Atom) (consequent: Atom) (alternative: Atom) (environment: Environment) (eval: Eval): Atom =
let evaluatedPredicate = eval predicate environment
match evaluatedPredicate with
| Atom.Boolean true -> eval consequent environment
| Atom.Boolean false -> eval alternative environment
| _ -> failwith $"Predicate passed to if must evaluate to a boolean. {predicate} does not."
Running this code:
>> (if (= 1 1) true false)
Boolean true
>> (if (= 1 2) true false)
Boolean false
and, or
and
and or
are quite similar so I’ll only illustrate and
. The and
special form has the following syntax:
(and <e1> <e2> .... <en> )
Any number of expressions can be passed to it and they’re evaluated in left to right order.
let andForm (expressions: Atom list) (environment: Environment) (eval: Eval) =
let evaluatedExpressions =
expressions
|> List.map (fun atom -> eval atom environment)
|> List.map (fun atom ->
match atom with
| Boolean x -> x
| _ -> failwith $"Expressions passed to `and` must evaluate to a boolean. {atom} does not.")
|> List.reduce (&&)
Atom.Boolean evaluatedExpressions
This was my first implementation. It’s easy to reason about. It evaluates every expression passed to the (and ... )
, ensures they all evaluate to booleans, and then applies F#’s built in AND logic to determine the result. It works fine, but it can be improved.
Short-circuiting is a feature of logical operators that helps with runtime performance. We can stop evaluating the expressions passed to (and ... )
as soon as we encounter the first expression that returns false. There’s no point to continue to evaluate them as the return value, false, has already been determined. This can be done by replacing F#’s List.map
which applies a function to every item in the list, with List.tryFind
which applies a function one at a time, stopping when the function returns true.
I think this implementation is OK, but the use of inverse booleans is ugly and would be confusing without the comments.
let andFormShortCircuit (expressions: Atom list) (environment: Environment) (eval: Eval) =
// evaluate the atoms in parameters one by one. We stop when we get to the first false
let hasAFalsyExpression =
expressions
|> List.tryFind (fun atom ->
let evaluatedExpr = eval atom environment
// tryFind requires the predicate function to return true when it should stop
// confusingly this is when the atom evaluates to false
match evaluatedExpr with
| Atom.Boolean b when b = false -> true
| Atom.Boolean b when b = true -> false
| _ -> failwith $"Expressions passed to the and must evaluate to a boolean. {atom} does not.")
// hasAFalsyExpression contains the first expression that evaluate to false, or None if all expressions were true
match hasAFalsyExpression with
| Some _ -> Atom.Boolean false
| None -> Atom.Boolean true
Running this code:
>> (and true true true)
Boolean true
>> (and true true false)
Boolean false
>> (or true true false)
Boolean true
cond
cond
(conditional) is a special form which takes many pairs of predicate and expressions. Predicates are evaluated left to right, and the first which returns true has it’s expression evaluated. We can think of this like a switch
statement present in many languages – without those bug-inducing fall through semantics.
Python is notable for not having a switch
, cond
, or match
-like statement. Instead the programmer represents the same logic in a series of if
, elif
, and else
statements.
(cond
(<p1> <e1>)
(<p2> <e2>)
(<pn> <en>))
The implementation is fairly succinct but the function is doing a lot:
- Asserting the data passed in. ie. that it is
(<p1>, <e1>)
pairs. - Type checking that the predicates evaluate to a boolean.
- The cond logic.
Of these the first interests me. Many functional programming Public Relations agencies claim that languages like Scheme which use S-Expressions supposedly have no syntax to learn. Yet here we check that all atoms passed to cond
are lists (as opposed to integers or symbols), and they are of length 2 – matching the predicate, expression pair.
We could forgo this checking of the syntax but it would result it unhelpful error messages and unclear semantics in the implementation.
let condForm (parameters: Atom list) (environment: Environment) (eval: Eval) =
let predicateThatEvaluatesToTrue =
parameters
|> List.chunkBySize 2 // chunk to lists intending to be [<p>; <e>]
|> List.concat
|> List.map (fun clause -> // ensure those lists actually are [<p>; <e>]
match clause with
| List atoms when atoms.Length = 2 -> (atoms.Item(0), atoms.Item(1))
| _ -> failwith $"Expected clause in `cond` to be (<predicate> <expression). Instead received: ${clause}")
|> List.tryFind (fun (predicate, _) -> // now search for the first predicate that evaluates to true
let evaluatedPredicate = eval predicate environment
match evaluatedPredicate with
| Atom.Boolean b when b = true -> true // tryFind returns true on this
| Atom.Boolean b when b = false -> false
| _ -> failwith $"Predicate passed to `cond` must evaluate to a boolean. {predicate} does not.")
// predicateThatEvaluatesToTrue contains the expression for the predicate that evaluated to true
// or None if all predicate were false
match predicateThatEvaluatesToTrue with
| Some(_, expression) -> eval expression environment
| None -> failwith "No predicate passed to `cond` evaluated to true."
Running this code:
>> (cond
(false 1)
(false 2)
(true 3))
Integer 3
For cond
, lambda
, and other special forms with structurally specific syntax I could imagine moving the syntax checking aspect into the parser. It doesn’t solve the problem though, it just moves the complexity across the programming language stack and would make the application harder to maintain. There’s value in having no complexity in the parser.
It also doesn’t help with the general problem. There will be user-defined functions which would want to benefit from this but can’t. Many languages use type systems to assist with this. Clojure’s clojure.spec is interesting in that it looks to solve this problem in a different way.
Conclusion
Prometheus has gifted logic to the programming language, now it can make decisions. One of the early code samples in Structure and Interpretation of Computer Programs is the fibonacci sequence generator. I’m using this as a milestone as it incorporates many aspects of programming - symbols, functions, logic, and recursion.
(define fib
(lambda (n)
(cond ((= n 0) 0)
((= n 1) 1)
(true (+ (fib (- n 1))
(fib (- n 2)))))))
Now equipped with boolean logic, this code should execute. Unfortunately it combusts in a depressingly obtuse stack overflow. The language currently has a single global scope for all symbols – there’s no lexical level scoping. This seems fundamentally at odds with recursion, so will be the next challenge to overcome. Read how it’s done in Code Execution: Eval, Apply.
Full source code is available here.