System-F^{1}, the core theory that underlies the idealized data abstraction that I described in Data Abstraction: Theory and Practice, also underlies Dhall, a programmable configuration language. Thus, it's pretty straightforward to translate it into executable Dhall code. This post serves three purposes: (1) Dhall coders can use techniques shown here to tame complexity through data abstraction. (2) Scholars seeking to understand idealized data abstraction can use this executable description as the further exposition of the concepts introduced in the first post. (3) I, Brandon, can use this prose to express my excitement over some very cool code.

"Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports"^{2}. For the functional programming enthusiasts, you can think of it as literally "System-Fω^{3} with records" and a little syntactic sugar on top. In other words, it's a well-designed programming language to replace the mess that is templated YAML or JSON. Dhall compiles into YAML or JSON (and Nix and Bash and a few other things) in the same way are existing template expanders compile templated YAML into YAML.

Since idealized data abstraction using the "Existential Types As Rank2 Universal Types" interpretation as described in my last post is implementable in pure System-F, I realized it would be pretty easy to translate it to Dhall.

I created a GitHub repo, abstraction, that houses a collection of the code shown below. It builds and runs with Dhall v1.30.0.

Before we begin to describe any particular interfaces, clients, and implementations, let's consider the type of these pieces.

Let's make a file

Given some specific "interface" $\tau$:

```
\(tau : Type -> Type) -> -- ...
```

Clients consume arbitrary implementations of our interface $t. \tau$ and return some other arbitrary result $u$ specific to our choice of client. Dhall has no type inference so we are explicit with our type-lambda for $u$.

```
let Client
: Type -> Type
= \(u : Type) -> forall(t : Type) -> tau t -> u
```

We wish to describe $\exists t. \tau$, the type of erased packed implementations. In the last data abstraction post, we discussed that in the Rank2 interpretation existentials are the type of a machine that takes a client and runs it on the concrete $t$ to produce our arbitrary $u$ value.

```
let ExistsTau
: Type
= forall(u : Type) -> Client u -> u
```

Finally, we can return these types in a record

```
in { Impl = ExistsTau, Client = Client }
```

This expression's type captures the input $t.\tau$ and this record.

```
: (Type -> Type) -> { Impl : Type, Client : Type -> Type }
```

These types are reusable in any place we wish to use data abstraction in Dhall programs.

We'll describe and implement the same example as we discussed in the blog post.

In a file

```
-- An infinite stack interface and implementation
```

First let's import the Prelude, Dhall's standard library.

```
let Prelude =
https://prelude.dhall-lang.org/v15.0.0/package.dhall sha256:6b90326dc39ab738d7ed87b970ba675c496bed0194071b332840a87261649dcd
```

And the file we made above.

```
let Existential = ./Existential.dhall
```

Now we can describe the infinite stack interface.

```
let Interface
: Type -> Type
= \(t : Type)
-> { create : Natural -> t
, push : Natural -> t -> t
, pop : t -> { x : Natural, rest : t }
}
```

Feeding this interface into

```
let e = Existential Interface
let Impl = e.Impl
let Client = e.Client
```

Let's choose a simple implementation. A

```
let Stack = { default : Natural, xs : List Natural }
```

Now, we write the implementation of our interface using this concrete implementation. It is the engine that feeds a client the concrete stack implementation.

```
let StackImpl
: Impl
= \(u : Type)
-> \(x : Client u)
-> x
Stack
{ create = \(i : Natural) -> { default = i, xs = [] : List Natural }
, push = \(i : Natural) -> \(t : Stack) -> t // { xs = [ i ] # t.xs }
, pop =
\(t : Stack)
-> { x =
Prelude.Optional.fold
Natural
(Prelude.List.head Natural t.xs)
Natural
(\(x : Natural) -> x)
t.default
, rest =
{ default = t.default
, xs = Prelude.List.drop 1 Natural t.xs
}
}
}
```

Finally, we return the interface, our concrete implementation, and the client type.

```
in { Interface = Interface, Impl = StackImpl, Client = Client }
: { Interface : Type -> Type, Impl : Impl, Client : Type -> Type }
```

The interface can be reused to build other implementations and the client type can be used for any consumers of this implementation or any others.

We first import our interface, implementation, and client type.

```
let Stack = ./Stack.dhall
```

Clients can't depend on any particular implementation of

```
let stackClient
: Stack.Client Natural
= \(t : Type)
-> \(stack : Stack.Interface t)
-> let s = stack.create 10
let s = stack.push 1 s
let s = stack.push 2 s
let y = stack.pop s
in y.x
```

We can run our client by feeding it to some particular implementation

```
in Stack.Impl Natural stackClient
```

When we execute

```
$ dhall <<< './Client.dhall'
2
```

Implementations as values, upcasting, extension, and interface composition are all directly expressible by translating almost literally character-by-character from their latex System-F representation to Dhall. I cover each of these capabilities in the last post.

Since System-F powers both idealized data abstraction and Dhall, we can describe these primitives directly in Dhall. Regardless of whether you've used this post to learn a new capability of Dhall, to further study information hiding and encapsulation, or to indulge my insatiable need to share in that which I am interested, feel free to send me a tweet with any feedback, corrections, or comments to @bkase_.

]]>One of the greatest tools to tame complexity in a growing codebase, growing either via lines of code or via people, is through strategic information hiding — you may know this as encapsulation, modularization, or data abstraction. In this post, we'll explore data abstraction's definition and capabilities from a formal perspective and show to what extent we can achieve the ideal through Swift, TypeScript, OCaml, ReasonML, Rust, and Haskell. Knowing how to use data abstraction in your language of choice is important in the short term, but understanding and naming the general concepts equips you to understand these tools as our languages evolve and change. We'll end with an explanation of how a more advanced usage of abstract data types shows up in the comonadic UI framework used in barbq.

Our brains prefer to compartmentalize and draw black boxes around modules while stitching together pieces of code. When designing or changing these pieces we can additionally create interfaces, traits, protocols, signatures, typeclasses (we'll use "interface" for the rest of this article) that hide implementation details and make these black boxes explicit. Moreover, we can use these interfaces as contracts between implementor and client to allow for easy concurrency in our code composition; the less we need to keep in our heads for each part of our system, the more parts of our system we can context switch between and can collaborate on with colleagues.^{1} These contracts also reduce that which implementors need to provide to clients to a minimum as that which can be derived others can attach by enriching or extending the given interfaces.

A perspective on data abstraction that we will not be exploring is that from the Object-Oriented school. We will only explore a type-theoretic perspective. The best way to enforce conventions is for them to be compile errors. The best way to formalize something is by connecting it to formal logic. This is not to say that there is nothing to learn from the OO-perspective, but it is this author's view that the logical perspective is more logical.^{2}

Statically typed programming languages supply three kinds of syntactic forms associated with data abstraction. The *interface*, or an *abstract type*, hides private details of the *implementation* from the *client*.

```
// interface, abstract type: A protocol
protocol ExampleProtocol {
static func create(_ s: String) -> Self
func read() -> String
}
```

```
// interface, abstract type: An interface
//
// In TypeScript, we have to have separate interfaces for the static and
// instance sides of a class
interface ExampleConstructor {
new (s: string): ExampleInterface
}
interface ExampleInterface {
read(): string;
}
```

```
(* interface, abstract type: A module type *)
module type Example_intf = sig
type t
val create : string -> t
val read : t -> string
end
```

```
// interface, abstract type: A trait
trait ExampleTrait {
fn create(s: &'static str) -> Self
fn read(&self) -> &'static str
}
```

```
-- interface, abstract type: A typeclass
class ExampleClass a where
create :: String -> a
read1 :: a -> String
```

```
type ExampleInterface<'a> =
abstract member create : string -> 'a
abstract member read : 'a -> string
```

```
// implementation: A struct (or a class)
public struct Example {
let s: string
public static func create(_ s: string) -> Example {
return Example(s: s)
}
public func read() -> String {
return self.s
}
}
```

```
// implementation: A function (for static members)
// A class (for instance ones)
function createExample(ctor: ExampleConstructor, s: string): ExampleInterface {
return new ctor(s);
}
class Example implements ExampleInterface {
s: string;
constructor(s: string) { this.s = s }
read(): string {
return this.s;
}
}
```

```
(* implementation: A module *)
module Example : Example_intf = struct
type t = string
let create s = s
let read s = s
end
```

```
// implementation: A struct
pub struct Example {
s: &'static str
}
impl ExampleTrait for Example {
fn create(s: &'static str) -> Example {
return Example{ s }
}
fn read(&self) -> &'static str {
return self.s
}
}
```

```
-- implementation: A typeclass instance
instance ExampleClass String where
create = id
read1 = id
```

```
type Example () =
interface ExampleInterface<string> with
member _.create s = s
member _.read s = s
```

```
func client<E: ExampleInterface>() {
let ex = E.create("hello");
println(ex.read());
}
```

```
// client
// it's a bit hard to write down a generic function for this in TypeScript
// because there are two separate interfaces
// ...
const ex = createExample(Example, "hello");
console.log(ex.read())
```

```
(* a client is a functor *)
module Client(E: Example_intf) = struct
let ex = E.create "hello" in
printf "%s\n" (E.read ex)
end
(* we can make clients first-class with first-class modules *)
let client (module E: Example_intf) =
let ex = E.create "hello" in
printf "%s\n" (E.read ex)
```

```
fn client<E: Example>() {
let ex = E::create();
println!("{:}", ex.read());
}
```

```
client :: ExampleClass e => IO ()
client = do
let ex = create "hello"
print $ read1 ex
```

```
let E = Example () :> ExampleInterface<string>
let ex = E.create "hello"
E.read ex |> printf "%s\n"
```

The surface syntax differs among programming languages, but through them all, you can identify a *client* interacting with an *implementation* through an *interface*. The extent to which they achieve the ideal, from a semantics perspective, is something we will study in this post. Studying the ideal equips the student with the capacity for applying these techniques across all programming languages rather than relearning what is truly the same each time a new language is presented.

To *really* understand what an interface does it must be equipped with laws. With sufficient laws, concrete implementations can be swapped without clients observing changes, or dually, clients can be written without implementations existing. "Sufficient laws" gives us both obvious property-based tests and a state known as *representation independence*, but this we will discuss in another post.

We can concisely communicate these laws through the use of interfaces that express algebraic structures. With enough practice, our whole industry can instantly be aware of some structures' associated laws just through name recognition.

We can imagine a tower we can attempt to climb whenever working on some piece of new code:

- Algebraic Structures
- Lawful Interfaces
- Interfaces
- Concrete Code

On the bottom, there is that which is is formed with the least effort, buckets of plain code. We can add more order through interfaces. Further refine it with laws. And finally, lighten the burden of needing to constantly revisit laws through the identification of algebraic structures.

Sometimes we'll be unable to identify an algebraic structure, perhaps we don't want to put the time into discovering the laws, and we're just prototyping or writing glue so we don't want to come up with interfaces. But when necessary, the tower I've shown here gives us a strategy for simplifying pieces of code.

In this post, we'll focus only on the third layer, interfaces. Note that we've already talked a bit about the top layer in earlier posts starring algebraic structures. The second layer will be discussed in a follow-up post.

As stated earlier, understanding the concepts in your chosen language is useful now, but understanding them from a formal perspective will persist through your career. This post will show how these idealized forms manifest in mainstream languages as a tool for better internalizing these concepts.

To motivate the right way to think about abstract data types (interfaces), I want to contrast it to working with parametric polymorphism which you may know this as programming with "generic"s or "template"s.

Consider a function that takes in a list of arbitrary elements, $A$, and returns the length.

```
func length<A>(es: [A]) -> Int {
return es.count
}
```

```
function length<A>(es: A[]) -> Int {
return es.length
}
```

```
let length: 'a list -> int = fun es -> List.length es
```

```
fn length<A>(es: &Vec<A>) -> Int {
es.len()
}
```

```
length :: List a -> Int
length = L.length
```

```
let length<'a> (es: 'a list): int = es.Length
```

When implementing the body of such parametrically polymorphic functions we're constrained to not rely on the type of $A$. In return, callers of our function have the liberty to choose any $A$ — in our case they can pass a list of *universal quantification* of our type variable, $A$.

When defining such generic functions we're defining a family of functions: One for each choice of concrete type. This family is, in a sense, an infinite product^{2} of all such functions.

Consider an abstract data type representing an integer stack that is never empty. What we're describing is "some stacklike thing" that can push and pop integers.

```
// In real code, you'd probably want to use a struct with vars on the fields
// but I want to keep these examples more or less a rosetta-stone of one-another
public protocol StackProtocol {
static func create(def: Int) -> Self
func push(elem: Int) -> Self
func pop() -> (Int, Self)
}
enum List<A> {
case empty
case cons(A, List<A>)
}
public struct SimpleStack {
// we'll use the default when the array is empty
let def: Int
let data: List<Int>
}
extension SimpleStack : StackProtocol {
static func create(def: Int) -> Stack {
return Stack(def: def, data: .empty)
}
func push(elem: Int) -> Stack {
return Stack(def: self.def, data: .cons(elem, self.data))
}
func pop() -> (Int, Stack) {
switch self.data {
case .empty:
return (self.def, self)
case .cons(e, es):
return (e, Stack(def: self.def, data: es))
}
}
}
```

```
interface StackConstructor {
new (def: number): StackInterface
}
interface StackInterface {
push(elem: number): StackInterface;
pop(): [number, StackInterface];
}
function createStack(ctor: StackConstructor, def: number): StackInterface {
return new ctor(def);
}
class SimpleStack implements StackInterface {
// we'll use the default when the array is empty
def: number;
data: number[];
constructor(def: number, data?: number[]) {
this.def = def;
this.data = data || [];
}
push(elem: number): StackInterface {
return new SimpleStack(this.def, [elem, ...this.data]);
}
pop(): [number, StackInterface] {
if (this.data.length > 0) {
return [this.data[0], new SimpleStack(this.def, this.data.slice(1))];
} else {
return [this.def, this];
}
}
}
```

```
module type Stack_intf = sig
type t
val create : int -> t
val push : t -> int -> t
val pop : t -> int * t
end
module Stack : Stack_intf = struct
(* we'll use the default when the list is empty *)
type t = int list * int
let create i = ([], i)
let push (s, def) i = (i::s, def)
let pop = function
| ([], def) -> (def, ([], def))
| (x::xs, def) -> (x, (xs, def))
end
```

```
// In real code, you'd probably want to use mutation here
// but I want to keep these examples more or less a rosetta-stone of one-another
trait StackTrait {
create(default: u32) -> Self;
push(&self, elem: u32) -> Self;
pop(&self) -> (u32, Self);
}
#[derive(Clone)]
pub struct SimpleStack {
// we'll use the default when the array is empty
default: u32
data: Vec<u32>
}
impl StackTrait for SimpleStack {
fn create(default: u32) -> SimpleStack {
return SimpleStack{ default, data: Vec::new() }
}
fn push(&self, elem: u32) -> SimpleStack {
return SimpleStack {
default: self.default,
data: { let xs = self.data.clone(); xs.push(elem); xs }
}
}
fn pop(&self) -> (u32, SimpleStack) {
if self.data.len() > 0 {
let data_ = self.data.clone();
let top = data_.pop();
(top, SimpleStack { default: self.default, data: data_ })
} else {
(self.default, self.clone())
}
}
}
```

```
class StackClass a where
create :: Int -> a
push :: a -> Int -> a
pop :: a -> (Int, a)
-- we'll use the default when the array is empty
newtype Stack = Stack (Int, List Int)
instance StackClass Stack where
create i = Stack (i, [])
push (Stack (def, s)) i = Stack (def, i::s)
pop s@(Stack (def, [])) = (def, s)
pop (Stack (def, e::es)) = (e, Stack (def, es))
```

```
type StackInterface<'a> =
abstract member create : int -> 'a
abstract member push : 'a -> int -> 'a
abstract member pop : 'a -> int * 'a
type Stack () =
interface StackInterface<int list * int> with
member _.create i = [], i
member _.push ((s, def): int list * int) i = i::s, def
member _.pop x =
match x with
| [], def -> def, ([], def)
| x::xs, def -> x, (xs, def)
```

When implementing our functions we have the liberty to rely on the concrete type. Essentially, the self is a parameter for some of these functions. The self passing is implicit in many languages, but, interestingly, very explicit in Rust. In contrast, users of our interface, callers of create, push, and pop, are constrained to not be able to rely on the concrete type of the stack.

When defining such abstract data types in a sense we're defining a family of constructors for data types: One for each choice of concrete implementation as we can forget the details that make these implementations unique. This family is, in a sense, an infinite sum; we have one variant for each concrete implementation.

In this way, parametric polymorphism is dual to data abstraction.

Through the Curry-Howard isomorphism^{4} a generic $A$ in our types correspond to $\forall A$ in logic. In other words, a universally quantified type variable in type theory is isomorphic to a universally quantified propositional variable in logic. The dual of $\forall$ is $\exists$ or "there exists." Now we can go backward through Curry-Howard and land on the irrefutable conclusion that *abstract types are existential types*. There exists some concrete stack, where the implementor knows the underlying concrete representation, but as the client, we don't know the details. We *existentially quantify* over the type of the concrete representation.

Our idealized form of data abstraction will refer to abstract data types as $\exists t.\tau$ where $\tau$ stands in for some time that depends on $t$. Concretely for stacks: $\exists t. \langle create : int \rightarrow t, push: t \rightarrow int \rightarrow t, pop: t \rightarrow (int \times t) \rangle$. In English, you may say, there exists some stack,

We can pack a chosen representation type, $\rho$, along with an implementation $e$ replacing $\rho$ for $t$ in our existential box to create an abstract data type (or introducing a new variant to our infinite sum) $\bigcup \rho; e; \exists t.\tau$^{5}. Concretely for the stack example we can choose $\rho$ to be the default int and a list storing the values pushed so far: $\bigcup (Int * List[Int]) ; \langle create = \dots, push = \dots, pop = \dots \rangle ; \exists t. \langle create : unit \rightarrow t, push: t \rightarrow int \rightarrow t, pop: t \rightarrow (int \times t) \rangle$

A client is an expression that opens a packed value for use under an environment where the choice of the existential $t$ is opaque. The client must be able to run *for all* specific implementations. Due to these requirements, it's best to think of a client as a function^{6} of type $\forall t. \tau \rightarrow \tau_2$. Note, we add a further restriction that $t$ cannot show up in the return type $\tau_2$. We'll show below how this restriction increases the power of our abstract data type.
Concretely for the stack example: a function that pops two ints off of our stack and returns their sum would have type $\forall t. \langle create : unit \rightarrow t, push: t \rightarrow int \rightarrow t, pop: t \rightarrow (int \times t) \rangle \rightarrow int$

Recall that these idealized forms manifest themselves with a subset of their power in our programming languages as shown below:

```
// interface, abstract type: A protocol *)
protocol ExampleProtocol {
static func create(_ s: String) -> Self
func read() -> String
}
```

```
// interface, abstract type: An interface
//
// In TypeScript, we have to have separate interfaces for the static and
// instance sides of a class
interface ExampleConstructor {
new (s: string): ExampleInterface
}
interface ExampleInterface {
read(): string;
}
```

```
(* interface, abstract type: A module type *)
module type Example_intf = sig
type t
val create : string -> t
val read : t -> string
end
```

```
// interface, abstract type: A trait
trait ExampleTrait {
fn create(s: &'static str) -> Self
fn read(&self) -> &'static str
}
```

```
-- interface, abstract type: A typeclass
class ExampleClass a where
create :: String -> a
read1 :: a -> String
```

```
type ExampleInterface<'a> =
abstract member create : string -> 'a
abstract member read : 'a -> string
```

```
// implementation: A struct (or a class)
public struct Example {
let s: string
public static func create(_ s: string) -> Example {
return Example(s: s)
}
public func read() -> String {
return self.s
}
}
```

```
// implementation: A function (for static members)
// A class (for instance ones)
function createExample(ctor: ExampleConstructor, s: string): ExampleInterface {
return new ctor(s);
}
class Example implements ExampleInterface {
s: string;
constructor(s: string) { this.s = s }
read(): string {
return this.s;
}
}
```

```
(* implementation: A module *)
module Example : Example_intf = struct
type t = string
let create s = s
let read s = s
end
```

```
// implementation: A struct
pub struct Example {
s: &'static str
}
impl ExampleTrait for Example {
fn create(s: &'static str) -> Example {
return Example{ s }
}
fn read(&self) -> &'static str {
return self.s
}
}
```

```
-- implementation: A typeclass instance
instance ExampleClass String where
create = id
read1 = id
```

```
type Example () =
interface ExampleInterface<string> with
member _.create s = s
member _.read s = s
```

```
// client
// ...
let ex = Example.create("hello");
println(ex.read());
```

```
// client
// ...
const ex = createExample(Example, "hello");
console.log(ex.read())
```

```
(* client *)
(* ... *)
let ex = Example.create "hello" in
printf "%s\n" (Example.read ex)
```

```
// client
// ...
let ex = Example::create();
println!("{:}", ex.read());
```

```
-- ... do
let ex = create "hello"
print $ read1 ex
```

```
let E = Example () :> ExampleInterface<string>
let ex = E.create "hello"
E.read ex |> printf "%s\n"
```

In this section, we'll enumerate a few interesting properties of abstract data types first in their idealized forms and then in our mainstream languages. If you only want to see languages that can properly express all of these properties, skip to the OCaml or ReasonML versions of these code samples.

In an ideal world, a packed implementation is a value. It is first-class. We can create it in an expression anonymously, we can accept it as an argument to a function, and we can return it from a function as well. $\bigcup \rho; e; \exists t.\tau$ can appear anywhere any other expression can appear.

Note: The seeming lack^{7} of power of Haskell is just due to this section occurring before I explain Rank2 types.

```
// In Swift, implementations are not first-class. We can't create them
// anonymously. However, we can accept them as arguments and return them from
// functions with some caveats.
// interfaces are explicitly declared
protocol Tau {
var name: String { get }
}
// If the protocol doesn't use other associated type variables or Self
// we can treat packed implementations first-class values.
//
// As a consequence, it is not possible to pass some packed implementation
// that has a mechanism for constructing Tau instances. We have to construct
// instances only with the concrete implementations.
func acceptTau(tau: Tau) {
return tau.name
}
// Note that when we consider using Tau as a client, that is universally
// quantifying over the Tau and opening the packed implementation we can
// work with protocols with associated types and ones that use Self
//
// This limits the expressive power of our packed implementations. See the next
// code sample for one example of something that cannot be expressed as a
// client.
func acceptTau2<Packed: Tau>(p: Packed) {
return p.name
}
```

```
// In TypeScript, implementations are not first-class. We can't create them
// anonymously. However, we can accept them as arguments and return them from
// functions. In TypeScript, the static and instance side of a class have
// separate interfaces, so we are even more limited here than in something like
// Swift.
// interfaces are explicitly declared
interface Tau {
name(): string
}
// As stated above if we consider only the "instance" side interface we can
// accept a packed implementation as an argument
function acceptTau(t: Tau): string {
return t.name()
}
```

```
(* OCaml and ReasonML are the only languages that can properly express first
* class implementations anonymously. We do so via the first-class module
* approach. *)
(* interfaces are explicitly declared *)
module type Tau = sig
type t
val name : t -> string
end
(* create anonymously and returning from a function *)
let make_tau =
let x = fib 22 in
(* ... *)
let packed : (module Tau) = (module struct
let name = "anonymous implementation"
end) in
packed
(* accepting packed implementation as an argument *)
let accept_tau (module T: Tau) (t: T.t) = T.name t
```

```
// In Rust, implementations are not first-class. We can't create them
// anonymously. However, we can accept them as arguments and return them from
// functions with some caveats.
// interfaces are explicitly declared
trait Tau {
fn name(&self): &'static str;
}
// As long as the trait only has functions with self attached to them, ie. it
// represents instances of tau, can we pass the implementations around.
// Additionally the functions cannot reference `Self` anywhere. In Rust, the
// runtime behavior (dynamic dispatch) is explicit.
func acceptTau(tau: Box<dyn Tau>) -> &'static str {
return tau.name
}
// Note that when we consider using Tau as a client, that is universally
// quantifying over the Tau and opening the packed implementation we can
// work with traits that have other functions inside them.
//
// This limits the expressive power of our packed implementations. See the next
// code sample for one example of something that cannot be expressed as a
// client.
func acceptTau2<Packed: Tau>(p: Packed) -> &'static str {
return p.name
}
```

```
-- In Haskell, with the type-class treatment, implementations are not
-- first-class. We can't create them anonymously. We can neither accept them
-- as arguments nor return them from functions unless we explicitly open our
-- package
--
-- Note: In Haskell, Rank2 types is a workaround for the lack of first-class
-- implementations via the typeclass-model for data abstraction. This will be
-- explained in further detail in a later section of this post.
-- interfaces are explicitly declared
class Tau t where
name :: t -> String
-- This is a client, it universally quantifies over Tau, we cannot talk about
-- the packed implementation itself
acceptTau :: Tau t => t -> String
acceptTau t = name t
```

```
(* In F#, implementations are not first class *)
(* interfaces are explicitly declared *)
type Tau<'a> =
abstract member name : 'a -> string
(* we can accept packed implementations as arguments *)
let acceptTau<'a>(tau: Tau<'a>) = tau.name
```

This property provides many interesting capabilities that we won't enumerate in full. Here's just one: Coupled with the rule restricting the existentially quantified $t$ from appearing in the result of client functions, we can use the first-classed-ness to allow for the unification of disparate concrete types in branches as long as they pack to the same interface. Our machines use dynamic dispatch to make this work at runtime.

```
// If the protocol doesn't use associated types or Self, we can unify instances
// in this manner. But not the abstract data types themselves
func myPet(preference: Preference) -> Animal {
// assuming dog and cat are available in scope
switch preference {
case .active: dog as Animal
case .passive: cat as Animal
}
}
// If the protocol does use associated type variables or Self, we can only
// accept one specific packed-implementation, we are forced to open the package
// with a client and we have a $\forall A$ that we cannot specify is a dog or a
// cat, it is chosen for us.
//
// !!! This function fails to compile !!!
func badMyPet<A: Animal>(preference: Preference) -> A {
switch preference {
// we cannot cast dog or cat to A
case .active: dog as A
case .passive: cat as A
}
}
```

```
// We can do this typescript if we only consider the instance side interface
function myPet(preference: Preference): Animal {
// assuming dog and cat are available in scope
if (preference == "active") {
return dog as Animal
} else {
return cat as Animal
}
}
```

```
(* We can do this in OCaml and ReasonML using first-class modules *)
let my_pet : Preference.t -> (module Animal_intf) = function
| Active -> (module Dog : Animal_intf)
| Passive -> (module Cat : Animal_intf)
(* We can also do it from the Rank2 perspective (explained later in this
* article). For the curious, one way to do so is using GADTs *)
```

```
// If the trait only has instance functions on it we can unify instances
// in this manner. But trying this with a trait that contains functions missing
// self, will meet you with a compile error.
fn myPet(preference: Preference) -> Box<dyn Animal> {
match preference {
Active => Box::new(Dog {}),
Passive => Box::new(Cat {})
}
}
```

```
-- We can do this in Haskell, but we'll need to use tools on top of Rank2 types
-- which will be explained later. For the curious, you can achieve this with
-- GADTs
```

```
type Animal = interface end
type Dog () =
class
interface Animal
end
type Cat () =
class
interface Animal
end
type Preference = Active | Passive
let myPet (preference: Preference): Animal =
match preference with
| Active -> Dog () :> Animal
| Passive -> Cat () :> Animal
```

In addition to being able to pack an implementation to an interface, we can also pack a more detailed interface into one that hides more information as long as we keep the same implementation type. Upcasting, $\uparrow*{\tau*{+},\tau*{-}}$, is a client that performs this refining accepting a $\tau*{+}$ and returning a new packed $\tau_{-}$.

For example, consider if we modelled a sort of iterator: $\exists t.\langle next : \dots \rangle$. We can refine a stack into this iterator by using pop. I'm intentionally making

$\uparrow*{\tau*{stack}, \tau_{iterator}} = \forall t. \lambda s : \langle \dots, pop : t \rightarrow (int \times t) \rangle.$

$\quad \bigcup t; \langle next = s.pop \rangle ; \langle next : t \rightarrow (int \times t) \rangle$

```
// Swift does not support upcasting directly
protocol Iterator {
func next() -> (Int, Self)
}
// We can't directly define upcast as a function in Swift because we can't
// return this kind of iterator inside a box using dynamic dispatch as it has a
// Self in it.
// We can create a wrapper struct to achieve the upcast rather than a function
public struct StackWithIterator<S: StackProtocol> {
let stack: S
}
extension StackWithIterator : StackProtocol {
static func create(def: Int) -> StackWithIterator<S> {
return StackWithIterator(stack: S.create(def))
}
func push(elem: Int) -> StackWithIterator<S> {
return StackWithIterator(stack: self.stack.push(elem))
}
func pop() -> (Int, StackWithIterator<S>) {
let (elem, s) = self.stack.pop();
return (elem, StackWithIterator(stack: s))
}
}
extension StackWithIterator : Iterator {
func next() -> (Int, StackWithIterator<S>) {
return self.stack.pop()
}
}
// We can almost define upcast if we change our protocol to use an
// associated-type instead of Self to represent the container
protocol Iterator {
associatedtype Container
func next() -> (Int, Container)
}
// However, we still can't return the iterator because `I` is universally
// quantified. The caller picks I. We need I to be existentially quantified, so
// our implementation of the function can pick I instead.
func upcastStackIterator<S: StackProtocol, I: Iterator>(s: S) -> I where I.Container == S {
// Cannot implement
}
```

```
// TypeScript does not support upcasting directly with the class-based model
//
// I imagine you could achieve something like this if you use JavaScript objects
// directly, but you may be making a tradeoff there in other places. Please open
// a PR if you want to share that approach.
interface IteratorDirect {
next(): [number, IteratorDirect]
}
// We can define upcast in TypeScript but only because we couldn't really define
// Stacks properly in the first place as interfaces are disjoint between the
// static and instance members.
//
// But we need to make a wrapper object to support this if we use the interface
// as defined.
// We can instead define the interface as a record with a function on it
interface Iterator1 {
next: () => [number, Iterator1]
}
// Now we can define this function nicely
function upcast_stack_iterator<S extends StackInterface>(s: S): Iterator1 {
return ({ next: () => {
let [s2, elem] = s.pop();
return [elem, upcast_stack_iterator(s2)]
} })
}
// Yes, this actually compiles on Try TypeScript.
```

```
(* OCaml's and ReasonML's first-class modules let us upcast without needing to
* muck with any extra named data types *)
module type Iterator_intf = sig
type t
val next : t -> (int * t)
end
let upcast_stack_iterator (module Stack : Stack_intf) : (module Iterator_intf) =
let repack : (module Iterator_intf) = (module struct
type t = Stack.t
let next = Stack.pop
end) in
repack
```

```
// Rust does not support upcasting directly
pub trait Iterator {
next(&self): (u32, Self)
}
// We can't directly define upcast as a function in Rust because we can't return
// this kind of iterator inside a box using dynamic dispatch as it has a Self in
// it.
// We can create a wrapper struct to achieve the upcast rather than a function
pub struct StackWithIterator<S: StackTrait>(stack: S);
impl<S: StackTrait> StackTrait for StackWithIterator<S> {
fn create(def: u32) -> StackWithIterator<S> {
StackWithIterator(S::create(def))
}
fn push(&self, elem: u32) -> StackWithIterator<S> {
StackWithIterator(self.stack.push(elem))
}
fn pop(&self) -> (u32, StackWithIterator<S>) {
let (elem, s) = self.stack.pop();
(elem, StackWithIterator(s))
}
impl<S: StackTrait> Iterator for StackWithIterator<S>
fn next(&self) -> (u32, StackWithIterator<S>) {
self.pop()
}
}
// We can almost define upcast if we change our protocol to use an
// associated-type instead of Self to represent the container
trait Iterator {
type Container;
next(): (Int, Self::Container);
}
// However, we still can't return the iterator because `I` is universally
// quantified. The caller picks I. We need I to be existentially quantified, so
// our implementation of the function can pick I instead.
fn upcastStackIterator<S: StackProtocol, I: Iterator>(s: S) -> I where I.Container == S {
// Cannot implement
}
```

```
-- We cannot upcast directly with the typeclass-model, but we can using GADTs
-- which is omitted here
```

```
(* We can upcast with a function if we introduce the intermediate type *)
type IteratorInterface<'a> =
abstract member next : 'a -> int * 'a
type StackIterator<'a> (stack: StackInterface<'a>) =
interface IteratorInterface<'a> with
member _.next x = stack.pop x
let upcast_stack_iterator (stack: StackInterface<'a>) : IteratorInterface<'a> =
StackIterator stack :> IteratorInterface<'a>
```

We can extend and enrich packed implementations with further behavior if that behavior is definable only relying on the existing behavior of the interface. Extension, $\bigstar*{\tau*{-}, \tau*{+}}$, is a client that performs this enriching operation on a $\tau*{-}$ returning a new packed $\tau_{+}$.

For example, consider if we wanted to expose a

$\bigstar*{\tau*{-}, \tau_{+}} = \forall t. \lambda s : \langle \dots \rangle.$

$\quad \bigcup t; \langle \dots, peek = snd(pop \; t) \rangle ; \langle \dots, peek : t \rightarrow int \rangle$

```
// Extension in Swift can be achieved with extensions on protocols
extension StackProtocol {
func peek() -> Self {
let (_, s) = self.pop()
return s
}
}
```

```
// Extension in TypeScript is not possible directly as you cannot give default
// functions to interfaces.
// You can achieve extension similarly to our workaround for upcasting, by
// introducing another type
class PeekingStack<S: StackInterface>(s: S): StackInterface {
/* ... similar to the example in "upcasting" */
peek(): PeekingStack<S> {
// omitted for brevity
}
}
```

```
(* Extension can be achieved with functors *)
module PeekingStack(S: Stack_intf) = struct
(* everything in S *)
include S
(* and peek *)
let peek s = snd (pop s)
end
```

```
// Extension in Rust can be achieved with trait impls
impl StackTrait {
fn peek(&self) -> Self {
let (_, s) = self.pop();
s
}
}
```

```
-- Extension in haskell can be achieved with default methods in an extension
-- typeclasses
class StackClass s => PeekingStack s where
peek s = snd (pop s)
```

```
(* Extension can be achieved by adding a member *)
type PeekingStack<'a> (stack: StackInterface<'a>) =
interface StackInterface<'a> with
member _.create i = stack.create i
member _.push x i = stack.push x i
member _.pop x = stack.pop x
member _.peek s = snd (stack.pop s)
```

Given two interfaces we can compose them to create a new interface that has the contents of both $(\exists t.\tau_1) \& (\exists s.\tau_2) = \exists t.\exists s.\tau_1 \& \tau_2$.

Given two packed implementations we can combine them to this composed interface:

$p_1 \circ p_2 : (\exists s.\tau_1) \rightarrow (\exists t.\tau_2) \rightarrow \exists s.\exists t.(\tau_1 \& \tau_2) = \forall s. \lambda p_1. \forall t. \lambda p_2.$

$\quad \bigcup s; (\bigcup t; \tau_1 \& \tau_2; \exists t.(\tau_1 \& \tau_2));$

$\quad \exists s.\exists t.(\tau_1 \& \tau_2)$

```
// Interface composition (via protocols) is done using multiple inheritance on
// protocols or the & operator
//
// Packed implementations can be composed by adding a conformance to this
// composed protocol with a wrapper type that combines the implementations if
// necessary.
public protocol Push {
func push(elem: Int) -> Self
}
public protocol Pop {
func pop() -> (Int, Self)
}
public protocol PushPop : Push, Pop { }
extension Stack: Push { /* ... */ }
extension Stack: Pop { /* ... */ }
// here we don't need a wrapper type since Stack is Push and Pop already but we
// would otherwise
extension Stack: PushPop { }
```

```
// TypeScript is similar to Swift and Rust, but alternatively, we can again use
// the interface as records approach and compose more naturally
// TODO: I really want to post this blog post. Please open a PR if you'd like to
// implement this one
```

```
(* We can combine packed interfaces by including modules, and we can even define
* compose as a function using first-class modules
*
* We can also compose interfaces by including module types. *)
module type Pushable_intf = sig
type t
val push: t -> int -> t
end
module type Popable_intf = sig
type t
val pop: t -> int * t
end
module type PushPop_intf = sig
include Pushable_intf
include Popable_intf with type t := t
end
let compose_push_pop
(type a)
(module Push: Pushable_intf with type t = a)
(module Pop: Popable_intf with type t = a) : (module PushPop_intf) =
let packed : (module PushPop_intf) =
(module struct
include Push
include Pop
end) in packed
```

```
// We can compose interfaces, the traits, with the + operator, and then compose
// packed interfaces by adding conformance for a trait that is constrained to be
// that composition with a wrapper type that combines the implementations (if
// relevant).
pub trait Push {
fn push(&self, elem: u32) -> Self;
}
pub trait Pop {
fn pop(&self) -> (u32, Self);
}
pub trait PushPop : Push + Pop { }
impl Push for SimpleStack { /* ... */ }
impl Pop for SimpleStack { /* ... */ }
// here we don't need a wrapper type since SimpleStack is Push and Pop already
// but we would otherwise
impl PushPop for SimpleStack { }
```

```
-- Interface composition (via constraint composition) is definable in userland
-- Haskell as long as you turn on the right extension knobs.
--
-- See http://hackage.haskell.org/package/constraints-extras-0.3.0.2/docs/Data-Constraint-Compose.html
-- for an example
--
-- I don't believe it's possible to derive a typeclass from the composition of
-- two typeclasses. Please open a PR to show me if it's true!
```

```
(* We can combine packed interfaces with first-class interfaces.
*
* We can also compose interfaces by inheriting abstract members. *)
type Push =
abstract member push : 'a -> int -> 'a
type Pop =
abstract member pop : 'a -> int * 'a
type PushPop =
inherit Push
inherit Pop
type ComposedPushPop (pushAndPop: Push * Pop) =
let push, pop = pushAndPop
interface PushPop with
member _.push x i = push.push x i
member _.pop x = pop.pop x
```

It is surprisingly possible to represent the existential abstract data types entirely using $\forall$s in programming languages where rank2 types are expressible. Rank2 types are polymorphic types where the $\forall$ quantifier can appear parenthesized to the left of a $\rightarrow$, or equivalently, to the right of a data type declaration.

```
// Swift doesn't support Rank2 types
//
// For my Swift implementation of Comonadic UI, I needed to hack around Swift's
// inabillity to represent Rank2 types. If you just "disable" the type-checker
// by casting to and from Any, the code will still run.
//
// See https://github.com/bow-swift/bow/pull/470/files#diff-cc655fa2944d79be4fc27fbeb114082bR25 for an example of this.
```

```
// TypeScript doesn't support Rank2 types
```

```
(* OCaml and ReasonML support Rank2 types, but only if they are located inside
* of data types either inside a record or a GADT *)
(* The user of rank2record picks the 'a *)
type rank2record = { foo: 'a. 'a -> 'a -> 'a }
(* The user of rank2gadt picks the 'a *)
type rank2gadt =
| Foo : 'a -> rank2gadt
```

```
// Rust doesn't support Rank2 types
```

```
-- Haskell support Rank2 types if you enable the Rank2Types or RankNTypes
-- extension
-- in the implementation the caller picks the `a`, our implementation picks `b`
rank2sample :: forall a. (forall b. b -> b -> b) -> a -> a -> a
rank2sample f x y = f x y
-- we can also use rank2 types in type declarations
newtype Foo = Foo (forall b. b -> b -> b)
```

```
(* F# only supports Rank2 types via anonymously implementing an interface. See
* https://stackoverflow.com/questions/7213599/generic-higher-order-function/7221589#7221589
*)
```

Now for the intuition as to why data abstraction's existential interface, the packed implementation, and client are all definable using just the universal $\forall$:

Recall that a client is a function that is polymorphic over the choice of concrete $t$, it's a function $\forall t. \tau \rightarrow \tau_2$.

Let us consider our abstract data type, the packed implementation, as a process instead of an object. It's a machine that can take a client that acts on arbitrary implementations and return something (be it an

We can translate that explanation into a type $\forall u. (\forall t.\tau \rightarrow u) \rightarrow u$. Restated: An abstract data type is a function that takes a client and runs it on some underlying implementation of $\tau$ with some concrete choice for $t$. The caller of this machine picks $u$ and the machine (the packed implementation) picks the $t$ as it's exactly the hidden representation type of the implementation; thus explaining why the caller needs to pass a client that can run $\forall t$. In this formulation, it is relatively straightforward to define precise implementations for the packed *implementation* and *client* objects.^{8}

In the Comonadic UI, there are two types that are effectively mutually recursive, but use a rank2 type, effectively an existensial type, to lower complexity during framework usage.^{9}

The two types are a

Recall a reducer as described in an earlier post.

```
typealias Reducer<S,A> = Func<A, Endo<S>>
```

```
type Reducer<S, A> = Func<A, Endo<S>>
```

```
(* module StateEndo = ... *)
module Reducer = struct
type 'a t = 'action -> StateEndo.t
end
```

```
type Reducer s a = a -> Endo s
```

```
type reducer<'state, 'action> = 'action -> Endo<'state>
```

Now, the rank2

```
newtype UI a v = UI (forall component. (Reducer a component -> v))
```

This type represents a UI at this moment. A

```
newtype Component w m v = Component (w (UI (m ()) v))
```

Think of a component as a space of all possible future instantaneous UIs with a pointer at the current one. You can avoid thinking about the

Notice that the

The rank2 type here grants us the ability to not need to talk about the

I'll explain these data types further as well as the comonadic UI framework as a whole in later posts.

Data abstraction helps us work on large codebases with ourselves and others by giving us tools to share and reuse code more easily. The Tower of Taming Complexity argues we can clarify code with interfaces, clarify interfaces with laws, and clarify lawful interfaces with algebraic structures. The programming languages we use every day have a way to express *interfaces*, *implementations*, and *clients*, but rather than thinking about the theory of data abstraction through our favorite language, we use an idealized one. Idealized data abstraction, thinking about abstract data types as the dual to parametric polymorphism, as existentials, shows us not only what we can achieve in our existing languages today but what we hope to achieve in future ones. Finally, we saw that existential types can be expressed with rank2 universal types and dug a slightly deeper into the comonadic UI framework.

Next time, we'll cover the part of the tower on lawful interfaces. We'll dig into representation independence and discuss mechanically discovering laws associated with interfaces. Plus how those laws guide us towards nice property-based tests for our code. Thus, granting us the courage within us to refactor without fear.

Thank you Chris Eidhof and Daira Hopwood for pointing out some mistakes in early versions of this post! Thank you Janne Siera for adding F# examples to the post!

I heavily relied on Mitchell and Plotkin's "Abstract Types Have Existential Type" and Chapter 17 of Practical Foundations of Programming Languages (PFPL) by Bob Harper, and, of course, Wikipedia when writing this post. "Abstract Types Have Existential Type" more thoroughly talks through the different forms of composition and power abstract types have and PFPL introduces the existential, pack, and open syntactic forms, shows typing rules and provides one take on the representability with rank-2 types. I intended to repackage these pieces of information in a simplified manner and reflecting on how this theory manifests within mainstream programming languages. If you want to learn more, I recommend reviewing these sources.

- I squeezed a lot into this metaphor. Think of people on a team as threads. Even in a single-threaded system, concurrency is useful, when working with slow IO for example, and with the right tools, it is manageable. The prudent use of interfaces makes it easier to work with other people or switch between code even on a solo project.↩
- Pun intended.↩
- The Curry-Howard isomorphism allows us to teleport types in our programming languages back and forth with theorems in logic; values with proofs. This is a very fascinating property of our universe and I encourage you to explore it! But that is not what this post is about. To learn more see the wikipedia page on the subject.↩
- For typesetting purposes, I chose a union symbol since packing an implementation is like a variant constructor for the infinite sum that the interface represents. In other literature, you may see this represented with the operator "pack".↩
- In order to simplify the presentation (and further explanations below), I chose to think about clients as they're expressed in System-F. Typically, in other literature, you will see this represented as "open" or "abstype".↩
- I'm sure there's some obscure extension that can support first-class packed implementations, but I needed a transition to the material later, so please let me tease it here.↩
- See Practical Foundations of Programming Languages (PFPL) by Bob Harper on page 155↩
- I am pretty sure I learned this trick by reading through one of Phil Freeman's original implementations of Comonadic UI, but I am unable to find the source.↩

I recently ran into an "interesting" bug in a side project that I've tinkered with on and off over the last year^{1}. This parable will take us through the trials and tribulations of fixing this bug. Along the way, we'll learn about, and take advantage of, a partial isomorphism to build powerful tools and synthesize tests. Most of this post is in Rust, the lingua franca of the project; however, when I'm sharing information on general concepts I'll show examples in Swift, Haskell, TypeScript, OCaml, and F# too.

I'm working on a Gameboy emulator in Rust that compiles to WebAssembly. This is an attempt to simultaneously get more comfortable with Rust, and some experience working with WebAssembly.

At the time of writing this post, I have an emulator that can almost play Tetris:

Click to play. Enter=Start. Arrow keys=dpad. A=A.

But it's pretty hard to debug some of the logic issues. Dr Mario, the next game on the path towards a working emulator, just dumps garbage to VRAM for a few frames and then shows a white screen while trying to write to every address in memory in a loop.

Luckily, some folks have created test ROMs as they've worked on emulators themselves. These test ROMs are Gameboy games that are purpose-built to test specific pieces of the Gameboy hardware.

One of the more famous families of tests is Blargg's Gameboy hardware test ROMs. For now, I'm going through the

01-special tests a few "special case" instructions — jumps, a few special loads, and the decimal add adjust^{2} instruction. After a little bit a finagling, I could get 01-special to pass. This story is not about that finagling. This story is about getting the test to work. Just because it passes, doesn't mean it works.

Let me explain.

These ^{3}.

This is cool because it's pretty easy to hack up a routine to dump each line of the console:

```
if n == 0x81 {
/* ... */
self.buffer.push(self.serial_byte as char);
if self.serial_byte == ('\n' as u8) {
log(&format!("Serial: {:}", self.buffer));
self.buffer.clear();
}
/* ... */
}
```

In the console we get:

```
Serial: 01-special
Serial: Pased
```

This^{4} is pretty cool! But here's what doesn't work: The screen just remains white. No console output shows up on the gameboy screen.

I know my PPU^{5} drawing logic mostly works because Tetris mostly renders, so it has to be something in this ROM.

It's tough to try to debug this. All we have is 32kB of binary, much of which we interpret as instructions.

We could open another emulator, BGB, and step through that in parallel to stepping through in my emulator. With this, we can compare the output instruction by instruction — this process was actually quite helpful with Tetris.

But for this white-screen rendering issue the step-and-compare doesn't really help — in this ROM there are too many loops and no indication of exactly where or why things are going wrong. These loops mean that single-stepping is untenable.

At this point my colleague, Nathan Holland, suggests to compare execution traces between emulators (a known good one and mine). This is a great idea! We can take binjgb which actually already has a flag to enable tracing registers and instruction information before every instruction executes. Finally, we implement the same printing logic on my emulator and compare the output.

This diff view is a little helpful, but only slightly better than single-stepping through the program. The traces grow at around 30MB/s, so analysis is tough.

Plus, it turns out that a lot of these programs have some noise in the logs that cause forks between even two "correct" emulators. In my emulator, I run the bootrom before the ROM code, but binjgb just starts at the top. Some of the instructions load state from the pixel processing unit which is extremely timing sensitive. This causes traces to diverge fairly quickly after programs start.

Well, what if we manipulate the traces a bit? Instead of just dumping the trace to stdout, we can load the traces into a data structure.

```
pub struct Record {
registers: Registers,
ip: InstrPointer,
cy: u64,
ppu_display: bool,
bank: Option<u8>,
instr_bytes: Vec<u8>,
}
```

In my emulator, we can populate this struct directly, but we'll need to be able to create the records from the log lines if we want to easily slurp up the data from binjgb. This calls for a parser.

Whenever I'm required to write a parser, I reach for parser combinators. Think of a parser as a function

In Rust, there's a neat parser-combinator library, nom, that maintains the performance of a hand-written imperative parser while giving its users the power of composability.^{6}

For example, this is what a parser for the "bank" portion of a trace line looks like:

```
fn bank(input: In) -> IResult<In, Option<u8>> {
delimited(
tag("["),
map(take(2 as usize), |xs: In| from_hex8(xs).ok()),
tag("]"),
)(input)
}
```

With this, it wasn't too hard to finish up a Rust utility that can parse the trace output via stdin (at ~50MB/s no less). This means we can keep up with traces as they're being generated in realtime by the emulators.

Parsing and pretty-printing are duals. Together they form a partial-isomorphism^{7}. We can take advantage of this to generate useful tests for the parsing and printing logic.

Two functions $f : A \rightarrow B$ and $g : B \rightarrow A$ form an isomorphism if $\forall a: A. g(f(a)) = a$ and $\forall b: B. f(g(b)) = b$. In other words, $f$ and $g$ invert one another.

For any type, $X$, we can define a type $X+1$ with constructors $Left : X -> (X+1)$ and $Right : 1 -> (X+1)$. This can also be written as (or, more formally, is isomorphic^{8} to) an

Two functions $f : A \rightarrow B$ and $p : B \rightarrow (A+1)$ define a partial isomorphism if $\forall a: A, p(f(a)) = Left(a)$ and $\forall b: B$ case $p(b)$ of $Right() \rightarrow true$ and $Left(a) \rightarrow f(a) = b$. In English, it more or less says that $f$ and $p$ invert one another assuming that $p$ succeeds.

If you prefer code:

```
trait PartialIso<A, B> {
fn full(a: A) -> B;
fn partial(b: B) -> Option<A>;
}
```

```
class PartialIso a b where
full :: a -> b
partial :: b -> Maybe a
```

```
struct PartialIso<A, B> {
let full: (A) -> B
let partial: (B) -> A?
}
```

```
interface PartialIso<A, B> {
full: (a: A) => B,
partial: (b: B) => A?
}
```

```
module type PartialIso = sig
type a
type b
val full: a -> b
val partial: b -> b option
end
```

```
type PartialIso<'a, 'b> =
abstract member full: 'a -> 'b
abstract member partial: 'b -> 'a option
```

Note that the equations that help define partial isomorphism become laws^{9} on the

In our case, $f$ or

These laws guide us towards two tests we should write.

First we can write a property-based test based on our first law. In Rust, I've chosen the proptest testing framework for my emulator. To write this test, we write our statement where all the foralls are parameters under a

```
proptest! {
#[test]
fn print_parse_partial_iso(r: Record) {
let printed = format!("{:}", r);
let (_, parsed) = Record::of_line(&printed).unwrap();
assert_eq!(r, parsed)
}
}
```

In order for this to work on our custom

```
#[derive(Debug, Clone)]
#[cfg_attr(test, derive(Arbitrary))]
pub struct Record {
registers: Registers,
ip: InstrPointer,
cy: u64,
ppu_display: bool,
bank: Option<u8>,
// fixing the value for instructions since not all instrs are implemented yet
#[cfg_attr(test, proptest(value = "vec![0x18, 0x29, 0x13]"))]
instr_bytes: Vec<u8>,
}
```

Note that all other custom types Registers, InstrPointers, etc. must also be annotated with this attribute.

This first test assures^{10} us of $\forall a: A. p(f(a)) = Left(a)$.

Next, we can choose a known good trace line — one that we know will succeed a

```
#[test]
fn parse_print_roundtrip() {
let line =
"A:01 F:Z-HC BC:0013 DE:00d8 HL:4000 SP:fffe PC:0216 (cy: 32) ppu:+0 |[00]0x0216: c3 00 02 jp $0200";
let (_, parsed) = Record::of_line(&line).unwrap();
let printed = format!("{:}", parsed);
assert_eq!(line, printed);
}
```

This test suffices for $\forall b: B$ case $p(b)$ of $Right() \rightarrow true$ and $Left(a) \rightarrow f(a) = b$.

Now that we have a struct that can be parsed and printed to stdout, we can manipulate the trace records in more interesting ways. For example, we can unique and sort by instruction-pointer! Instead of keeping every instance of each instruction in the trace as we execute the program, we can create the one "canonical" trace for each instruction. This reduces a 100MB file to 44KB. Now we have a much more manageable diff to pour over, and trace-diverging is no longer really a problem — we can just skip over any immediate dependencies of these instructions that load PPU state.

The newly reduced trace exposes this difference between the two emulators:

```
// my emulator
A:90 F:ZN-- BC:00FF DE:c79b HL:c7ad SP:dfe9 PC:c485 (cy: 382168) ppu:+0 |[00]0xc485: fa 1d d8 ld a,[$d81d]
A:31 F:ZN-- BC:00FF DE:c79b HL:c7ad SP:dfe9 PC:c488 (cy: 382184) ppu:+0 |[00]0xc488: 6f ld l,a
```

```
// binjgb
A:90 F:ZN-- BC:00FF DE:c79b HL:c7ad SP:dfe9 PC:c485 (cy: 456848) ppu:+0 |[??]0xc485: fa 1d d8 ld a,[$d81d]
A:00 F:ZN-- BC:00FF DE:c79b HL:c7ad SP:dfe9 PC:c488 (cy: 456864) ppu:+0 |[??]0xc488: 6f ld l,a
```

The data at

Now we have something to work with. We can instrument the implementation with hooks to dump instruction-pointers when this address is poked, and finally lead us to the root cause.

With this information, single stepping this function on my emulator and a known good one, BGB, lead us to the inconsistency.

```
dec [hl]
```

This instruction is supposed to decrement the byte indexed by the 16-bit address inside the HL register. However, on my emulator it actually incremented!

Upon opening up the CPUs implementation of

```
Dec(RegsHl::HlInd) => {
let operand = self.indirect_ld(RegisterKind16::Hl);
- let result = alu::inc(&mut self.registers.flags, operand.0);
+ let result = alu::dec(&mut self.registers.flags, operand.0);
self.indirect_st(RegisterKind16::Hl, result);
}
```

Quite embarrasingly simple too! Finally this test now both passes and works.

Click to start. After "Passed" is printed, it inifite loops.

Write more tests. As Jorge Castillo and Javi Taiyou mentioned in a Twitter thread about this, if I had better test coverage on the CPU part of the codebase I would have easily caught this issue, and I would likely catch other issues.

I am immediately going to work on building out testing infrastructure for this part of my emulator.

I'm thinking we could build even more complex representations of the runtime traces — and there's probably a body of research that I am unfamiliar with re: "dynamic" analysis of running programs (non-static analysis?). For example, you could potentially automate the removal of any lines of code that are "poisoned" by those instructions that load PPU state or do other sorts of noisy things.

In addition, wouldn't it be great if folks could exchange code reviews on side projects? I'd be happy to try this with someone. Tweet @bkase_ if you're interested.

Thank you Janne Siera and Chris Eidhof for pointing out some mistakes in early version of this post!

- Check out the gameboy video series I started and got burnt out on because it's takes too long to edit videos.↩
- Decimal add adjust (DAA) is a weird instruction. Add and subtract instructions act on bytes which are typically represented in hexadecimal. Sometimes it's preferable to interpret each nibble (4-bits) of the byte as a separate decimal number. This is called binary-coded-decimal. What DAA does is it applies a fix to rewrite the result of a binary-coded-decimal add to be accurate again. See some x86 docs for more↩
$81 is also known as0x81 or 129 in decimal↩- Yes "Passed" is mispelled — it's a side effect of my naive dumping logic↩
- Pixel processing unit. The "GPU" of the Gameboy.↩
- Nom used to require parsers to be written using macros. A long time ago, I used an old version of Nom to parse torrent files and it was very frustrating to deal with errors. Luckily, the newer version uses normal functions which give you much nicer errors.↩
- Brandon Williams showed me a cool paper a while back: "Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing" by Tillmann Rendel and Klaus Ostermann. This paper shows how you can build a sweet parser combinator library out of the partial isomorphism primitives such that one expression simultaneously expresses a parser and a pretty-printer. Brandon and Stephen Cellis use this construction for the router of pointfree.co's website, and not only does it parse and pretty-print, but it also can print as a template.↩
- No pun intended↩
- Equivalences between two programs which should always be true. See my post on semigroups and monoids for more.↩
- Property based testing aims to assure you that properties hold, but it does so via sampling of the statespace. So it is not a proof, merely an assurance.↩

Reducers — a tool for handling state changes — help engineers manage complexity in their applications. In this post, we'll dig into how these reducers tick by exploring some monoids on functions, learning some formal terms, and discovering the underlying reason that many engineers reach for reducers to simplify mutations of state. All code examples will be presented in OCaml, ReasonML, TypeScript, Haskell, and Swift.

I expect readers to be familiar with the material covered in the first post on semigroups and monoids, but to keep this post more or less self-contained I'll review the declaration of semigroups and monoids here:

```
module type Semigroup = sig
type t
(* We don't use <> in the ML langs because <> is traditionally "is not equal" *)
val (+) : t -> t -> t
end
module type Monoid = sig
include Semigroup
val empty : t
end
```

```
interface Semigroup<A> {
concat: (x: A, y: A) => A
}
interface Monoid<A> extends Semigroup<A> {
readonly empty: A
}
```

```
class Semigroup a where
(<>) :: a -> a -> a
class Semigroup a => Monoid a where
mempty :: a
```

```
protocol Semigroup {
static func <>(a: Self, b: Self) -> Self
}
protocol Monoid : Semigroup {
static var empty : Self { get }
}
```

```
type Semigroup<'a> =
abstract member concat: 'a -> 'a -> 'a
type Monoid<'a> =
inherit Semigroup<'a>
abstract member empty: 'a
```

Endofunctions, or in source code, ^{1}; for example ^{2} mutations into values.

To see this, let's first implement endofunctions:

```
module Endo (A: sig type t end) = struct
type t = A.t -> A.t
end
```

```
interface Endo<A> {
(x: A): A
}
```

```
newtype Endo a = Endo (a -> a)
```

```
struct Endo<A> { let run: (A) -> A }
```

```
type Endo<'a> = 'a -> 'a
```

And then consider a

```
module Person = struct
type t =
{ name: string
; age: int
}
end
```

```
interface Person {
name: string
age: number
}
```

```
data Person = Person
{ name: String
, age: Int
}
```

```
struct Person {
var name: String
var age: Int
}
```

```
type Person = {
name : string
age : int
}
```

A person, Fred, can age one year like so:

```
module PersonEndo = Endo(struct type t = Person.t end)
let oneYearOlder : PersonEndo.t = fun p -> { p with age = p.age + 1 }
let agedFred =
let fred = { Person.name = "Fred"; age = 20 } in
let fred' = oneYearOlder fred in
fred' (* { name = "Fred", age = 21 } *)
```

```
const oneYearOlder: Endo<Person> = p => ({ ...p, age: p.age + 1 });
const agedFred = () => {
const fred = { name: "Fred", age: 20 };
const fred_ = oneYearOlder(fred);
return fred_ /* { name: "Fred", age: 21 } */
}
```

```
oneYearOlder :: Endo Person
oneYearOlder = Endo $ \p -> p { age = age p + 1 }
agedFred :: Person
agedFred =
let fred = { name = "Fred", age = 20 } in
let fred' = oneYearOlder fred in
fred' (* { name = "Fred", age = 21 } *)
```

```
// it looks like we're mutating here, but due to value semantics of
// structs in Swift, this is fine
let oneYearOlder: Endo<Person> =
Endo { p in p.age = p.age+1; return p }
let agedFred = { () in
let fred = Person(name: "Fred", age: 20)
let fred_ = oneYearOlder.run(fred)
return fred_ /* Person(name: "Fred", age: 21) */
}
```

```
let oneYearOlder: Endo<Person> = fun p -> { p with age = p.age + 1 }
let agedFred =
let fred = { name = "Fred"; age = 20 }
let fred_ = oneYearOlder fred
fred_
```

Notice a few interesting facts here:

oneYearOlder , the change, is a value that we can store, manipulate, and do with what we choose.- We were able to change
fred despitefred being an immutable value in our language. To do this we can introduce a new value with the changes applied.

The semigroup instance on endofunctions gives us a way to combine two changes into a single change. Concretely, we may want to also change our name field to add a last name, for example, we want to

```
let oneYearOlder : PersonEndo.t = fun p -> { p with age = p.age + 1 }
let addLastNameSmith : PersonEndo.t = fun p -> { p with name = p.name ^ " Smith" }
let agedFred =
let fred = { Person.name = "Fred", age = 20 } in
let change = PersonEndo.(oneYearOlder + addLastNameSmith) in
let fred' = change fred in
fred' (* { name = "Fred Smith", age = 21 } *)
```

```
const oneYearOlder: Endo<Person> = p => ({ ...p, age: p.age + 1 });
const addLastNameSmith: Endo<Person> = p => ({ ...p, name: p.name + " Smith" });
const agedFred = () => {
const fred = { name: "Fred", age: 20 };
const change = endoMonoid<Person>().concat(
oneYearOlder,
addLastNameSmith
);
const fred_ = change(fred);
return fred_ /* { name: "Fred Smith", age: 21 } */
}
```

```
oneYearOlder :: Endo Person
oneYearOlder = Endo $ \p -> p { age = age p + 1 }
addLastNameSmith :: Endo Person
addLastNameSmith = Endo $ \p -> p { name = name p <> " Smith" }
agedFred :: Person
agedFred =
let fred = { name = "Fred", age = 20 } in
let change = oneYearOlder <> addLastNameSmith in
let fred' = change fred in
fred' (* { name = "Fred Smith", age = 21 } *)
```

```
let oneYearOlder: Endo<Person> =
Endo { p in p.age = p.age+1; return p }
let addLastNameSmith: Endo<Person> =
Endo { p in p.name = p.name + " Smith"; return p }
let agedFred = { () in
let fred = Person(name: "Fred", age: 20)
let change = oneYearOlder <> addLastNameSmith
let fred_ = change.run(fred)
return fred_ /* Person(name: "Fred Smith", age: 21) */
}
```

```
let oneYearOlder: Endo<Person> = fun p -> { p with age = p.age + 1 }
let addLastNameSmith: Endo<Person> = fun p -> { p with name = p.name + " Smith" }
let agedFred =
let fred = { name = "Fred"; age = 20 }
let change = PersonEndo.concat oneYearOlder addLastNameSmith
let fred_ = change fred
fred_
```

The monoid instance provides a way to model a trivial change^{3}. With this addition, we now have a nice base case if we're building up an arbitrary amount of changes based on some runtime information:

```
let change = PersonEndo.empty in
(* ... *)
let change =
change +
(if aYearPasses then
oneYearOlder
else
PersonEndo.empty)
in
()
(* etc *)
```

```
var change = endoMonoid<Person>.empty in
// ...
change =
endoMonoid<Person>.concat(
change,
aYearPasses ? oneYearOlder : endoMonoid<Person>.empty
);
// etc
```

```
let change = mempty in
-- ...
let change =
change <>
(if aYearPasses then
oneYearOlder
else
mempty)
in
-- etc
```

```
var change = Endo<Person>.empty
// ...
change =
change <>
(aYearPasses ? oneYearOlder : Endo<Person>.empty)
// etc
```

```
let change = PersonEndo.empty
(* ... *)
let change2 = PersonEndo.concat change (if aYearPasses then oneYearOlder else PersonEndo.empty)
(* etc *)
```

We can further improve on the above by utilizing a writer monad as described in an older post to remove all of the boilerplate doing something like the above.

An astute reader may notice "combining changes" is function composition or $\circ$, and the trivial change is the identity function or $id$. Thus the monoid we've been talking about this whole time is $(Endo_A, \circ, id_A)$^{4}.

```
module Endo (A: sig type t end) = struct
type t = A.t -> A.t
let (+) f g = fun a -> f (g a)
let empty = ident
end
```

```
const endoMonoid: <A>() => Monoid<Endo<A>> = () => ({
concat: (f, g) => x => f(g(x)),
empty: x => x
});
```

```
-- This is defined in Data.Monoid.Endo
instance Semigroup (Endo a) where
(Endo f) <> (Endo g) = Endo $ f . g
instance Monoid (Endo a) where
mempty = Endo id
```

```
extension Endo : Semigroup {
static func <>(f: Endo, g: Endo) -> Endo {
Endo { x in f(g(x)) }
}
}
extension Endo : Monoid {
static var empty { return Endo { x in x } }
}
```

```
type EndoMonoid<'a> () =
interface Monoid<Endo<'a>> with
member _.empty = fun x -> x
interface Semigroup<Endo<'a>> with
member _.concat f g = fun x -> f (g x)
let PersonEndo = (new EndoMonoid<Person> ()) :> Monoid<Endo<Person>>
```

A *pointwise operation* is some operation $\oplus$ on some type $T$ that is "lifted" to act on functions that return $T$. More formally, $\forall f,g. (f \oplus g)(x) = x \rightarrow f(x) \oplus g(x)$. If code is your thing, what follows is the function that lifts a binary operation pointwise:

```
(* val liftPointwise :
( 'a -> 'a -> 'a) ->
(('x -> 'a) -> ('x -> 'a) -> ('x -> 'a)) *)
let liftPointwise op = fun f1 f2 -> fun x ->
op (f1 x) (f2 x)
```

```
const liftPointwise_:
<X, A>(op: ( x: A, y: A) => A) =>
(f1: (x: X) => A, f2: (x: X) => A) =>
(x: X) => A =
op => (f1, f2) => x => op(f1(x), f2(x))
```

```
liftPointwise ::
( a -> a -> a) ->
((x -> a) -> (x -> a) -> (x -> a)
liftPointwise op = \f1 f2 -> \x ->
op (f1 x) (f2 x)
```

```
func liftPointwise<X, A>(
op: @escaping (A, A) -> A
) -> (@escaping (X) -> A, @escaping (X) -> A) -> (X) -> A {
return { f1, f2 in { x in op(f1(x), f2(x)) } }
}
```

```
(* val liftPointwise :
op: ('a -> 'b -> 'c) -> f1:('d -> 'a) -> f2:('d -> 'b) -> x:'d -> 'c *)
let liftPointwise op = fun f1 f2 -> fun x -> op (f1 x) (f2 x)
// or
let liftPointwise op f1 f2 x = op (f1 x) (f2 x)
```

An interesting property of pointwise operations is that if the underlying operation is a monoid^{5} then the resulting pointwise operation is a monoid too! I think a nice proof of this is to show the source code that performs this operation for us.

```
module LiftPointwise(M: Monoid) = struct
type 'x t = 'x -> M.t
let (+) f1 f2 = fun x -> M.((f1 x) + (f2 x))
let empty = fun _x -> M.empty
end
```

```
interface Func<X, A> {
(x: X): A
}
const liftPointwise: <X, A>(m: Monoid<A>) => Monoid<Func<X, A>> = m => ({
concat: (f1, f2) => x => m.concat(f1(x), f2(x)),
empty: _x => m.empty
})
```

```
-- This is defined in Data.Semigroup
instance Semigroup a => Semigroup (x -> a) where
(<>) f1 f2 = \x -> (f1 x) <> (f2 x)
-- This is defined in Data.Monoid
instance Monoid a => Monoid (x -> a) where
mempty = \_x -> mempty
```

```
struct Func<X, A> {
let run: (X) -> A
}
extension Func: Semigroup where A: Semigroup {
static func <>(f1: Func<X, A>, f2: Func<X, A>) -> Func<X, A> {
Func { x in
f1.run(x) <> f2.run(x)
}
}
}
extension Func: Monoid where A: Monoid {
static var empty: Func<X, A> {
return Func<X, A> { _x in A.empty }
}
}
```

```
type LiftPointWise<'x, 'a> (m: Monoid<'a>) =
interface Monoid<'x -> 'a> with
member _.empty = fun _ -> m.empty
interface Semigroup<'x -> 'a> with
member _.concat f1 f2 = fun x -> m.concat (f1 x) (f2 x)
```

Sometimes you want to manipulate functions over the operations rather than the underlying operations themselves. The nice thing is we don't have to give up our monoidal superpowers when we do so!

Manipulation of state in large applications quickly gets hairy. As an application grows, it becomes a real challenge to be sure that state mutations affect only the components you want it to. One mitigation is to centralize all of your state manipulation as best as you can — ideally to one function or one file or one module. To do so, we can decouple an intention to change state (or an action) from the actual state change itself.

Reducers are one way to cleanly declare atomic chunks of state manipulation in response to these actions. Smaller reducers can be composed into bigger ones as our application's state management grows in scope.

Let's take Redux reducers as an example to explore further. According to the official documentation, a reducer is defined by a function from the previous state and an action into a new state $(S, A) \rightarrow S$. In theory, we would feed some library a bunch of these reducers and in our application we could fire actions to trigger these state changes. In code, this definition of a reducer looks as follows:

```
let reducer : 'state * 'action -> 'state = ()
```

```
const reducer: <S,A>(state: S, action: A) => S
```

```
reducer :: (state, action) -> state
```

```
func reducer<S,A>(state: S, action: A) -> S
```

```
type reducer<'state, 'action> = 'state * 'action -> 'state
```

In Redux, reducers can be combined with a

Redux is great because it introduced the concept of reducers to the masses. But instead of using a library directly, let's re-arrange Redux's reducer function a bit to see if we can build the library ourselves.

```
let reducer : 'state * 'action -> 'state = ()
(* flip the tuple *)
let reducer : 'action * 'state -> 'state = ()
(* curry the function (unroll the tuple into a function) *)
let reducer : 'action -> 'state -> 'state = ()
(* rewrite ('state -> 'state) to StateEndo *)
let reducer : 'action -> StateEndo.t = ()
(* reducer is a monoid because StateEndo is a
monoid, and it's a pointwise function into
a monoid. *)
```

```
const reducer: <S,A>(state: S, action: A) => S
// flip the parameters
const reducer: <S,A>(action: A, state: S) => S
// curry the function (unroll the tupled parameters into a function)
const reducer: <S,A>(action: A) => (state: S) => S
// rewrite (state: S) => S into Endo<S>
const reducer: <S,A>(action: A) => Endo<S>
// wrap in Func
const reducer: <S,A>() => Func<A, Endo<S>>
// reducer() is a monoid because Endo<S> is a
// monoid, and it's a pointwise function int
// a monoid.
```

```
reducer :: (state, action) -> state
-- flip the tuple
reducer :: (action, state) -> state
-- curry the function
reducer :: action -> state -> state
-- rewrite (state -> state) to (Endo state)
reducer :: action -> Endo state
-- reducer is a monoid because (Endo state) is a
-- monoid, and it's a pointwise function into
-- a monoid.
```

```
func reducer<S,A>(state: S, action: A) -> S
// flip the parameters
func reducer<S,A>(action: A, state: S) -> S
// curry the function
func reducer<S,A>(action: A) -> (S) -> S
// rewrite (S -> S) to Endo<S>
func reducer<S,A>(action: A) -> Endo<S>
// wrap in Func
func reducer<S,A>() -> Func<A, Endo<S>>
// reducer is a monoid because Endo<S> is a
// monoid, and it's a pointwise function into
// a monoid.
```

```
type reducer<'state, 'action> = 'state * 'action -> 'state
// flip the tuple
type reducer<'state, 'action> = 'action * 'state -> 'state
// curry the function
type reducer<'state, 'action> = 'action -> 'state -> 'state
// rewrite (state -> state) to (Endo state)
type reducer<'state, 'action> = 'action -> Endo<'state>
// reducer is a monoid because (Endo state) is a monoid,
// and it's a pointwise function into a monoid
```

And we have a monoid! You could say a reducer is just the $Endo_{state}$ monoid lifted pointwise over actions. The monoid is precisely why using reducers is a nice way to decompose and reason about state changes in your application: Breaking down problems into pieces makes them more manageable, and the identity and associativity of the monoid means gluing them back together is easy. In fact, with our monoid instance on the manipulated reducer the $\epsilon$ and $\oplus$ give us a Redux-like library for free.

Endofunctions are monoids, pointwise monoidal operations are monoids, and combining these two function-monoids give us reducers! The monoidal formulation of reducers obsoletes a need for a library to provide us a way to combine reducers and give us motivation for why reducers are such a nice way to manage changes to a larger application state.

Thank you Thomas Visser and Stephen Celis for pointing out some mistakes in early versions of this post! Thank you Janne Siera for adding F# examples to the post!

Reducers show up in two separate places in the comonadic UI framework implementation that I used in barbq. In comonadic UI components, one of the kinds of reducers is located inside a writer monad. Most of the time a component won't need to react to any actions, and if this is the case, components are unencumbered by boilerplate specifying a "dummy" reducer. The writer monad just takes

*Endo*means precisely that the domain and range (of a function) are the same.↩- To reify is to make the abstract concrete, in this case, it's referring to making a "change" a value we can manipulate instead of just an operation we perform in our programs.↩
- A change that doesn't change anything↩
- The subscript $_A$ here is noting that there is a monoid instance for all choices of $A$.↩
- Moreover, operations lifted pointwise over any algebraic structure are also members of the same algebraic structure (so this is also true for semigroups, and other structures like rings, and semilattices too).↩

If you ask someone who has interacted with me in the last five years to describe me, they may say: Brandon loves monoids. I do love monoids, and although I do think there are enough existing materials on the subject on the internet, I figured I should probably add my take to the mix.^{1}

As engineers, we study algebraic structures (like semigroups and monoids) for a few reasons:^{2}

- Mathematics gives us an objective solution to "clean code" and API design — discovering the algebraic structure underlying the problem gives rise to a minimally complex and maximally expressive interface.
- These structures give names to incredibly abstract notions. Notions that we otherwise, as humans, would have a hard time discussing. When something has a name, our brains can reason about them. Shared vocabulary means more productivity for teams. Moreover, using these proper names introduces a ~hundred years of mathematics and computer science content for further study.

Semigroups and Monoids are the "20%" of algebraic objects that get you "80%" of the power. These are a functional programmer's basic building blocks: The ability to detect, digest, and discover them levels you up as an engineer!

Since I want this post to be maximally relevant to the audiences I think I'll reach, I'm preparing all code examples in OCaml, ReasonML, Haskell, and Swift throughout this post.

Algebraic structures in typed programming languages are defined by signatures/protocols/type-classes/interfaces. Instances of these structures are declared by conformances/instances of these signatures. In addition to those instances that don't type-check, the set of instances is further restricted by *laws* or equivalences between two programs which should always be true. For example, a structure with a *commutativity* law aka $\forall x,y. x \oplus y = y \oplus x$^{3} permits an implementation for $\oplus$ for integer multiplication but rejects matrix multiplication.^{4}

A semigroup is a type coupled with a closed^{5} binary^{5} associative operation that acts on that type, $(T, \oplus)$. Addition over integers, $(Int, +)$, multiplication over integers, $(Int, \times)$, and concat over non-empty lists, $(NonEmptyList, ++)$ are all semigroups. Likewise for cache composition and sequencing animations.

The associativity law demands $\forall x, y, z. (x \oplus y) \oplus z = x \oplus (y \oplus z)$. This is the case for all the examples shown above. A counter-example for illustration purposes: Subtraction over integers, $(Int, -)$. Proof: Take $x=1,y=2,z=3$, $(1 - 2) - 3$ evaluates to $-4$, but $1 - (2-3)$ evaluates to $+2$!

Since it's hard to type $\oplus$ in our programming development environments, we typically use

```
module type Semigroup = sig
type t
(* We don't use <> in the ML langs because <> is traditionally "is not equal" *)
val (+) : t -> t -> t
end
```

```
-- You can find this in Data.Semigroup
class Semigroup a where
(<>) :: a -> a -> a
```

```
protocol Semigroup {
static func <>(a: Self, b: Self) -> Self
}
```

Instances of semigroups are instances of the corresponding signature/protocol/type class:

```
module Sum : Semigroup = struct
type t = int
let (+) = Int.(+)
end
```

```
newtype Sum = Sum Int
instance Semigroup Sum where
(Sum x) <> (Sum y) = Sum (x + y)
```

```
struct Sum { let v: Int }
extension Sum : Semigroup {
func <>(a: Sum, b: Sum) -> Sum {
return Sum(a.v + b.v)
}
}
```

Algebraic properties give us magical powers. Associativity gives the programmer and the runtime the freedom to re-associate chunks of work.

As programmers, we get to choose grouping together operations in whichever way we feel is most appropriate for the situation.

```
let xy = x + y in
let all = xy + z in
(* or *)
let all = x + y + z in
()
(* ... *)
```

```
xy = x <> y
all = xy <> z
-- or
all = x <> y <> z
```

```
let xy = x <> y
let all = xy <> z
// or
let all = x <> y <> z
```

On the other hand, the machine can choose to schedule this work whenever it pleases. As a consequence, semigroups can hook into many pieces of machinery in other libraries and frameworks, for example, we can use the associativity to imbue parallelism into our work for free!

```
(* Work to do: x + y + z + w *)
let xy = x + y in (* thread1 *)
let zw = z + w in (* thread2 *)
xy + zw
```

```
-- Work to do: x + y + z + w
let xy = x <> y in -- thread1
let zw = z <> w in -- thread2
xy <> zw
```

```
// Work to do: x + y + z + w
let xy = x <> y // thread1
let zw = z <> w // thread2
xy <> zw
```

Associativity is a very common property, so whenever you find yourself with a binary operation — it's worth asking: Is this associative — is this a semigroup?

A monoid extends semigroups with an identity, $\epsilon$. So a monoid is a type, a closed binary associative operation, and an identity: $(T, \oplus, \epsilon)$. Many of the examples above for semigroups are also monoids: Addition of integers uses $0$ as an identity. Multiplication of integers' identity is $1$. We can construct an identity cache to make cache composition a monoid.

To be a valid identity, the following law must hold: $\forall x. x \oplus \epsilon = \epsilon \oplus x = x$, in other words, combining with the identity on the left or the right is the same as doing nothing at all. There is no $\epsilon$ which obeys that law that makes $(NonEmptyList, ++, \epsilon)$ a monoid. However, $(List, ++, [])$ is a monoid because concatenating the empty list on the left and right over any other list is the same as not concatenating at all.

Since it's hard to type $\epsilon$ in our programming development environments, we typically use

```
module type Monoid = sig
include Semigroup
val empty : t
end
```

```
class Semigroup a => Monoid a where
mempty :: a
```

```
protocol Monoid : Semigroup {
static var empty : Self { get }
}
```

An example instance:

```
module ListM : Monoid = struct
include ListS (* a semigroup *)
let empty = []
end
```

```
newtype ListM a = ListM (List a)
instance Semigroup (ListM a) -- ...
instance Monoid (ListM a) where
empty = ListM []
```

```
struct ListM<A> { let v: [A] }
extension ListM : Semigroup { /* ... */ }
extension ListM : Monoid {
static var empty: ListM { return [] }
}
```

The power of an identity is that there always exists a default or a known empty. Monoids let us "drop the option":

```
let annoyinglyNeedsOption =
if computation() then Some (x + y) else None
(* to *)
let expressiveNoNeedOption =
if computation() then x + y else M.empty
```

```
annoyinglyNeedsMaybe :: Maybe a
annoyinglyNeedsMaybe =
if computation then Just (x <> y) else None
-- to
expressiveNoNeedMaybe :: Monoid a => a
expressiveNoNeedMaybe x =
if computation then x else mempty
```

```
func annoyinglyNeedMaybe<M>() -> M? {
return computation ? x <> y : nil
}
// to
func expressiveNoNeedMaybe<M: Monoid>() -> M {
return computation ? x <> y : M.empty
}
```

Monoids are the building blocks of composition. And composition leads to clean, simple, and expressive code. Moreover, when you and your colleagues can speak about these abstract notions concretely you get a huge productivity boost!

Thank you Tiziano Coroneo, @_lksz_, Kaan Dedeoglu, and Avery Morin for pointing out some mistakes in early versions of this post!

- You can also choose to consume this blog post in video form with Swift as the programming language substrate.↩
- Okay, for full disclosure, I have to admit that intellectual self-indulgence also drives me to dig deep into this sort of thing. But trust me, semigroups and monoids are extremely useful!↩
- The upside-down
A , $\forall$, reads as "for all" -- this whole statement also reads as: For all choices of $x$ and $y$, combining $x$ and $y$ is the same as combining $y$ and $x$ — a formally precise way of saying "the order that we combine doesn't matter".↩ - Matrix multiplication $M_1
*M_2$ means something different than $M_2*M_1$ — see wikipedia for more.↩ - Closed in this context means that the operation always returns an element of this same type, $\forall x: T, y: T. (x + y): T$, the operation never diverges with an infinite loop or throws an exception.↩

Tracking knowledge across large areas of a codebase is a fairly common task. Keeping track of whether or not some cache miss happens when loading files is an example of this — you could want to track this in a metric. If we model knowledge monoidally, we can use a writer monad to track this knowledge without boilerplate in a simple and threadsafe manner.

Feel free to skip the background section if you don't care how this problem came up in a real project.

The Coda protocol uses zk-SNARKs to compress the blockchain down to a constant size. These SNARKs are proof objects certifying that some computation has been run correctly. Some nodes in the network create these SNARKs and others verify them. To create a SNARK one needs to have a specific large *proving key*^{1}, and to verify a SNARK one needs to a specific small *verification key*^{2}. These proving and verification keys are indexed by the SNARK logic itself. In other words, if any computation changes then the proving and verifying keys need to change. In other other words, whenever some code is changed in the SNARK circuit during the development of Coda, both proving keys and verification keys must be regenerated. Key generation also involves some randomness.

We want developers to be productive and so we want proving and verification key generation to happen transparently at build time.

Naively, one can just always regenerate all proving and verification keys every build, but this has some drawbacks: (1) As randomness is involved during key generation, keys created from different builds will be incompatible. For example, you wouldn't be able to connect to a live network that was built on CI^{3} with some local branch of code, even though when you haven't touched any SNARK logic that would invalidate the keys. (2) Also, key generation is slow so we'd like to skip that step if possible.

A nice solution to this would be to remove the randomness from key generation under certain conditions. This may be done by adding some debug branch through the code that generates. Unfortunately, it turned out to be quite a bit of work to get rid of all the randomness in the key generation logic. It's enough work that we decided to punt on it temporarily.

Another solution is to introduce a series of layered caches^{4} that would be placed in front of key-generation in the key-loading process. To load keys in Coda, the following process is followed:

- Try to load the keys from a manual override path
- Try to load the keys from the normal installation path
- Try to load the keys from an S3 installation path
- Try to load the keys from S3
- Try to load the keys from the auto-generation path
- Auto-generate the keys
- Store the keys in the auto-generation path (only if the keys were auto-generated)
- Store the keys in S3 (only if the keys were auto-generated)

To further complicate things, we need to track the outcome of the key generation process and propagate it to various interested observers for legacy reasons^{5}.

Finally, we're sufficiently motivated.

The following is a simplified model of the problem. In reality, there may be several more places where loading occurs:

```
module Action : sig
type 'a t
end
```

Actions are what happens when you invoke a load-or-generate function.

```
module A (Intf: sig
type elem1
type elem2
type t
val load_or_gen_a1 : unit -> elem1 Action.t
val load_or_gen_a2 : unit -> elem2 Action.t
val build : elem1 -> elem2 -> t
end) = struct
include Intf
let load_or_gen_a =
let open Deferred.Let_syntax in
let* a1 = loadOrGenA1 () in
let+ a2 = loadOrGenA2 () in
build a1 a2
end
```

In one spot we load

```
module B = struct
type t
let load_or_gen_b : unit -> t Action.t
end
```

In another, we load a

```
module Subroutine = struct
let load_or_gen () =
let open Deferred.Let_syntax in
let* a = load_or_gen_a () in
let+ b = load_or_gen_b () in
(a, b)
end
```

Finally we load

To model whether or not generation has occurred:

```
module Track_generated = struct
type t = [`Generated_something | `Cache_hit]
end
```

The task is to somehow incorporate generated knowledge into the output of

In this model, actions are the asynchronous monad. Let's assume we're using Jane Street's Async library and then implement

```
module Action = struct
type 'a t = 'a Deferred.t
end
```

A naive way to track whether or not generation has occurred is to introduce some sort of global mutable state.

```
module Global_mutable_state = struct
let generation_occurred =
let state = ref `Cache_hit in
fun () ->
state := `Generated_something
end
```

Then in

This is easy to implement, but it's a very terrible solution! Global mutable state means we need to be careful about concurrent writes. We also need to make sure we don't forget to call this function if we ever introduce a new place

The concept of knowledge has an interesting property: Once I know ^{6} un-know

At the bottom, I know nothing or $\bot$. Afterward, I can learn about $a$ or $b$, and once I see the other one I know both ${a,b}$.

This means we can implement a well-defined commutative^{7}, associative^{8}, idempotent^{9} binary operation $\vee$ that incorporates new knowledge. And in this case, since we don't act when

```
module Track_generated = struct
type t = [`Generated_something | `Cache_hit]
let bottom = `Cache_hit
(* In Haskell, we'd provide a Monoid instance and use
* <>, but OCaml uses <> for does-not-equal-to *)
let (+) x y =
match (x, y) with
| (`Generated_something, _)
| (_, `Generated_something) -> `Generated_something
| (`Cache_hit, `Cache_hit) -> `Cache_hit
end
```

Now we can ditch our global mutable state, and instead change our actions to also give us the new knowledge about

```
module Action = struct
type 'a t = ('a * Track_generated.t) Deferred.t
end
```

Then we update

```
let load_or_gen_a () =
let* (a1, t1) = load_or_gen_a1 () in
let+ (a2, t2) = load_or_gen_a2 () in
(build a1 a2, t1 + t2)
```

A better solution! We've eliminated global mutable state, so we've solved the concurrency problem. This is parallelizable since knowledge aggregation is commutative and associative. Unfortunately, we still have a bunch of boilerplate that we can't forget to add. And it's still brittle: The type-system will remind us to remember that actions have a tupled result, but we need to rely on warnings at best to remember to join together all of our knowledge.

The writer monad wraps computation alongside some monoidal^{10} value. Traditionally, this is used in a language like Haskell to aggregate logs^{11} — some call it the logging monad. However, we're free to use any monoid here. Monoids are values that have an identity and an associative binary operation — just like $\vee$ that we've described above.

For reference, here's an OCaml implementation of a writer monad^{12}:

```
module type Monoid_intf = sig
type t
val empty : t
val (+) : t -> t -> t
end
module Writer (M : Monoid) = struct
type 'a t = 'a * M.t
include Monad.Make (struct
type nonrec 'a t = 'a t
let return x = (x, M.empty)
let map = `Define_using_bind
let bind (a, m1) ~f =
let open M in
let (b, m2) = f a in
(b, m1 + m2)
end)
end
```

We can layer Writer on ^{13} to get a monad that both tracks knowledge and runs code concurrently.

```
module With_track_generated = struct
type 'a t = {data: 'a, dirty: Track_generated.t}
end
module Deferred_with_track_generated = struct
type 'a t = 'a With_track_generated.t Deferred.t
include Monad.Make (struct
type nonrec 'a t = 'a t
let return x =
Deferred.return
{With_track_generated.data= x;
dirty= Track_generated.empty}
let map = `Define_using_bind
let bind t ~f =
let open Deferred.Let_syntax in
let* {With_track_generated.data; dirty= dirty1} = t in
let+ {With_track_generated.data= output; dirty= dirty2} = f data in
{ With_track_generated.data= output
; dirty= Track_generated.(dirty1 + dirty2) }
end)
end
```

This time, we'll change

```
module Action = struct
type 'a t = 'a Deferred_with_track_generated.t
end
```

Then we'll get compilation errors until we change our leaf ^{14} instead of

This is the best solution! There is no global mutable state. There is no boilerplate. Our code is clean and knowledge propagation is neatly handled for you. If later more

Writer monads provide a boilerplate-free, thread-safe, non-error-prone way to propagate knowledge through large swaths of a codebase. In Coda, we used the Deferred

Thank you Christina Lee and Omer Zach for your thoughtful reviews on this post!

- In the low hundreds of megabytes in size↩
- Less than a few kilobytes in size↩
- Continuous integration↩
- Caches are monoids, did you know?↩
- If you really must know, we wanted to avoid implementing aws private-bucket uploading logic in OCaml just to support this one feature. So we want to shell out to the
aws CLI tool. We also don't want to require devs to have access to the private S3 bucket as Coda is an open source project. Only CI needs write access. So as a hack, when built on CI, the build fails in a strategic location after all SNARK related generation logic is finished.↩ - Especially if I am a computer.↩
- Knowing $a$ and $b$ can occur in any order.↩
- Knowing $a$ and $b$ and $c$ can be paired up via $a,b$ or $b,c$.↩
- Knowing $a$ and later learning about $a$ has the same effect as learning about $a$ only the first time.↩
- Keep reading below for more on monoids.↩
- The monoid being strings with concatenation for example↩
- I'm using monad helpers from Jane Street's base library.↩
- We do this manually in OCaml by convention rather than using monad-transformers like one would in Haskell.↩
- That would be appending
_with_track_generated in the let-deferred-let-synatx-open-in statements, effectively moving from one monad to another.↩

TUIs^{1} for life. Barbq 🍖 is a TUI-based status-bar for macOS.

Barbq is a simple, no frills, terminal-based status bar. I host it in an instance of alacritty terminal^{2} which I pin to the top of the screen using the yabai tiling window manager.

To be clear: The above is not a Linux system. I am running macOS 10.15. You're just seeing the wonderful kitty terminals, yabai window manager, and barbq status bar inside alacritty.

- Widgets for yabai tabs, internal/external ip addrs, volume, battery, wifi, date
- Volume info grabbed via low-level C executable to avoid overhead of invoking osascript
- Resource efficient
^{3} - Modular and extensible
^{4} - Stable enough that this author has been using barbq 100% of the time for over a month

- Support other data sourcing methods in addition to interval-based polling
^{5} - Cleaner and more usable layout methods for views
- More refined UI work regarding, colors, positiong, margins, and unicode decorations

For up-to-date installation instructions visit the GitHub project.

Barbq is written in Haskell. And is this author's first "real" Haskell project.

Since this is a side-project, long-cuts^{6} were made in order to do things in an interesting manner. Look for future blog posts on the two interesting pieces here: (1) Using free applicative functors for creating model data to feed views and (2) building terminal-UI components in UI-paradigm-agnostic comonadic style.

I welcome all code reviews! If you have any experience with Haskell, I'd appreciate feedback on my code. Please open an issue in the project to let me know!

]]>