# Writer monads for knowledge tracking in OCaml

Tracking knowledge across large areas of a codebase is a fairly common task. Keeping track of whether or not some cache miss happens when loading files is an example of this you could want to track this in a metric. If we model knowledge monoidally, we can use a writer monad to track this knowledge without boilerplate in a simple and threadsafe manner.

Feel free to skip the background section if you don't care how this problem came up in a real project.

## Background

The Coda protocol uses zk-SNARKs to compress the blockchain down to a constant size. These SNARKs are proof objects certifying that some computation has been run correctly. Some nodes in the network create these SNARKs and others verify them. To create a SNARK one needs to have a specific large proving key1, and to verify a SNARK one needs to a specific small verification key2. These proving and verification keys are indexed by the SNARK logic itself. In other words, if any computation changes then the proving and verifying keys need to change. In other other words, whenever some code is changed in the SNARK circuit during the development of Coda, both proving keys and verification keys must be regenerated. Key generation also involves some randomness.

We want developers to be productive and so we want proving and verification key generation to happen transparently at build time.

Naively, one can just always regenerate all proving and verification keys every build, but this has some drawbacks: (1) As randomness is involved during key generation, keys created from different builds will be incompatible. For example, you wouldn't be able to connect to a live network that was built on CI3 with some local branch of code, even though when you haven't touched any SNARK logic that would invalidate the keys. (2) Also, key generation is slow so we'd like to skip that step if possible.

A nice solution to this would be to remove the randomness from key generation under certain conditions. This may be done by adding some debug branch through the code that generates. Unfortunately, it turned out to be quite a bit of work to get rid of all the randomness in the key generation logic. It's enough work that we decided to punt on it temporarily.

Another solution is to introduce a series of layered caches4 that would be placed in front of key-generation in the key-loading process. To load keys in Coda, the following process is followed:

1. Try to load the keys from a manual override path
2. Try to load the keys from the normal installation path
3. Try to load the keys from an S3 installation path
4. Try to load the keys from S3
5. Try to load the keys from the auto-generation path
6. Auto-generate the keys
7. Store the keys in the auto-generation path (only if the keys were auto-generated)
8. Store the keys in S3 (only if the keys were auto-generated)

To further complicate things, we need to track the outcome of the key generation process and propagate it to various interested observers for legacy reasons5.

Finally, we're sufficiently motivated.

## Simplified Model

The following is a simplified model of the problem. In reality, there may be several more places where loading occurs:

module Action : sig  type 'a tend

Actions are what happens when you invoke a load-or-generate function.

module A (Intf: sig  type elem1  type elem2  type t  val load_or_gen_a1 : unit -> elem1 Action.t  val load_or_gen_a2 : unit -> elem2 Action.t  val build : elem1 -> elem2 -> tend) = struct  include Intf  let load_or_gen_a =    let open Deferred.Let_syntax in    let* a1 = loadOrGenA1 () in    let+ a2 = loadOrGenA2 () in    build a1 a2end

In one spot we load a1 and a2 to create an A.t.

module B = struct  type t  let load_or_gen_b : unit -> t Action.tend

In another, we load a B.t.

module Subroutine = struct  let load_or_gen () =    let open Deferred.Let_syntax in    let* a = load_or_gen_a () in    let+ b = load_or_gen_b () in    (a, b)end

Finally we load A.t and B.t together.

To model whether or not generation has occurred:

module Track_generated = struct  type t = [Generated_something | Cache_hit]end

The task is to somehow incorporate generated knowledge into the output of load_or_gen in a clean way.

In this model, actions are the asynchronous monad. Let's assume we're using Jane Street's Async library and then implement Action as follows:

module Action = struct  type 'a t = 'a Deferred.tend

## Tracking Knowledge v1

A naive way to track whether or not generation has occurred is to introduce some sort of global mutable state.

module Global_mutable_state = struct  let generation_occurred =    let state = ref Cache_hit in    fun () ->      state := Generated_somethingend

Then in load_or_gen_a1, load_or_gen_a2, and load_or_gen_b we need to call generation_occurred () whenever necessary.

This is easy to implement, but it's a very terrible solution! Global mutable state means we need to be careful about concurrent writes. We also need to make sure we don't forget to call this function if we ever introduce a new place load_or_gen leaf in the call graph. It's error-prone and brittle.

## Extending our knowledge model

The concept of knowledge has an interesting property: Once I know a, when presented with new information b I'll never6 un-know a. As a consequence, knowledge gives rise to a semilattice.

At the bottom, I know nothing or $\bot$. Afterward, I can learn about $a$ or $b$, and once I see the other one I know both $\{a,b\}$.

This means we can implement a well-defined commutative7, associative8, idempotent9 binary operation $\vee$ that incorporates new knowledge. And in this case, since we don't act when Cache_hit, we can treat Cache_hit as our identity or bottom state, and Generated_something is our top state.

module Track_generated = struct  type t = [Generated_something | Cache_hit]  let bottom = Cache_hit  (* In Haskell, we'd provide a Monoid instance and use   * <>, but OCaml uses <> for does-not-equal-to *)  let (+) x y =    match (x, y) with    | (Generated_something, _)    | (_, Generated_something) -> Generated_something    | (Cache_hit, Cache_hit) -> Cache_hitend

## Tracking Knowledge v2

Now we can ditch our global mutable state, and instead change our actions to also give us the new knowledge about Track_generated.t:

module Action = struct  type 'a t = ('a * Track_generated.t) Deferred.tend

Then we update load_or_gen_a and load_or_gen to thread through the knowledge. Here's what load_or_gen_a would look like afterwards:

let load_or_gen_a () =    let* (a1, t1) = load_or_gen_a1 () in    let+ (a2, t2) = load_or_gen_a2 () in    (build a1 a2, t1 + t2)

A better solution! We've eliminated global mutable state, so we've solved the concurrency problem. This is parallelizable since knowledge aggregation is commutative and associative. Unfortunately, we still have a bunch of boilerplate that we can't forget to add. And it's still brittle: The type-system will remind us to remember that actions have a tupled result, but we need to rely on warnings at best to remember to join together all of our knowledge.

The writer monad wraps computation alongside some monoidal10 value. Traditionally, this is used in a language like Haskell to aggregate logs11 some call it the logging monad. However, we're free to use any monoid here. Monoids are values that have an identity and an associative binary operation — just like $\vee$ that we've described above.

For reference, here's an OCaml implementation of a writer monad12:

module type Monoid_intf = sig  type t  val empty : t  val (+) : t -> t -> tendmodule Writer (M : Monoid) = struct  type 'a t = 'a * M.t  include Monad.Make (struct    type nonrec 'a t = 'a t    let return x = (x, M.empty)    let map = Define_using_bind    let bind (a, m1) ~f =      let open M in      let (b, m2) = f a in      (b, m1 + m2)  end)end

We can layer Writer on Track_generated under the async monad13 to get a monad that both tracks knowledge and runs code concurrently.

module With_track_generated = struct  type 'a t = {data: 'a, dirty: Track_generated.t}endmodule Deferred_with_track_generated = struct  type 'a t = 'a With_track_generated.t Deferred.t  include Monad.Make (struct    type nonrec 'a t = 'a t    let return x =      Deferred.return        {With_track_generated.data= x;         dirty= Track_generated.empty}    let map = Define_using_bind    let bind t ~f =      let open Deferred.Let_syntax in      let* {With_track_generated.data; dirty= dirty1} = t in      let+ {With_track_generated.data= output; dirty= dirty2} = f data in      { With_track_generated.data= output      ; dirty= Track_generated.(dirty1 + dirty2) }  end)end

## Tracking Knowledge v3

This time, we'll change Action once more:

module Action = struct  type 'a t = 'a Deferred_with_track_generated.tend

Then we'll get compilation errors until we change our leaf load_or_gen functions to return that initial Track_generated.t. And we're done! We don't have to change load_or_gen_a or load_or_gen_b at all other than opening this monad14 instead of Deferred.

This is the best solution! There is no global mutable state. There is no boilerplate. Our code is clean and knowledge propagation is neatly handled for you. If later more load_or_gen functions are introduced, the type system will demand that the author remembers to send a Track_generated.t with the output.

## Summary

Writer monads provide a boilerplate-free, thread-safe, non-error-prone way to propagate knowledge through large swaths of a codebase. In Coda, we used the Deferred with_track_generated monad to propagate knowledge that some generation occurs to give developers a transparent way to keep their code as compatible as possible with other builds to make them more productive.

Thank you Christina Lee and Omer Zach for your thoughtful reviews on this post!

1. In the low hundreds of megabytes in size
2. Less than a few kilobytes in size
3. Continuous integration
4. Caches are monoids, did you know?
5. If you really must know, we wanted to avoid implementing aws private-bucket uploading logic in OCaml just to support this one feature. So we want to shell out to the aws CLI tool. We also don't want to require devs to have access to the private S3 bucket as Coda is an open source project. Only CI needs write access. So as a hack, when built on CI, the build fails in a strategic location after all SNARK related generation logic is finished.
6. Especially if I am a computer.
7. Knowing $a$ and $b$ can occur in any order.
8. Knowing $a$ and $b$ and $c$ can be paired up via $a,b$ or $b,c$.
9. Knowing $a$ and later learning about $a$ has the same effect as learning about $a$ only the first time.
10. Keep reading below for more on monoids.
11. The monoid being strings with concatenation for example
12. I'm using monad helpers from Jane Street's base library.
13. We do this manually in OCaml by convention rather than using monad-transformers like one would in Haskell.
14. That would be appending _with_track_generated` in the let-deferred-let-synatx-open-in statements, effectively moving from one monad to another.