Semigroups and Monoids

If you ask someone who has interacted with me in the last five years to describe me, they may say: Brandon loves monoids. I do love monoids, and although I do think there are enough existing materials on the subject on the internet, I figured I should probably add my take to the mix.1

As engineers, we study algebraic structures (like semigroups and monoids) for a few reasons:2

1. Mathematics gives us an objective solution to "clean code" and API design discovering the algebraic structure underlying the problem gives rise to a minimally complex and maximally expressive interface.
2. These structures give names to incredibly abstract notions. Notions that we otherwise, as humans, would have a hard time discussing. When something has a name, our brains can reason about them. Shared vocabulary means more productivity for teams. Moreover, using these proper names introduces a ~hundred years of mathematics and computer science content for further study.

Semigroups and Monoids are the "20%" of algebraic objects that get you "80%" of the power. These are a functional programmer's basic building blocks: The ability to detect, digest, and discover them levels you up as an engineer!

Since I want this post to be maximally relevant to the audiences I think I'll reach, I'm preparing all code examples in OCaml, ReasonML, Haskell, and Swift throughout this post.

Algebraic structures in programs

Algebraic structures in typed programming languages are defined by signatures/protocols/type-classes/interfaces. Instances of these structures are declared by conformances/instances of these signatures. In addition to those instances that don't type-check, the set of instances is further restricted by laws or equivalences between two programs which should always be true. For example, a structure with a commutativity law aka $\forall x,y. x \oplus y = y \oplus x$3 permits an implementation for $\oplus$ for integer multiplication but rejects matrix multiplication.4

Semigroup

A semigroup is a type coupled with a closed5 binary6 associative operation that acts on that type, $(T, \oplus)$. Addition over integers, $(Int, +)$, multiplication over integers, $(Int, \times)$, and concat over non-empty lists, $(NonEmptyList, ++)$ are all semigroups. Likewise for cache composition and sequencing animations.

The associativity law demands $\forall x, y, z. (x \oplus y) \oplus z = x \oplus (y \oplus z)$. This is the case for all the examples shown above. A counter-example for illustration purposes: Subtraction over integers, $(Int, -)$. Proof: Take $x=1,y=2,z=3$, $(1 - 2) - 3$ evaluates to $-4$, but $1 - (2-3)$ evaluates to $+2$!

Since it's hard to type $\oplus$ in our programming development environments, we typically use <> to denote this operation. If the operation happens to be commutative7 as well, we use + -- though in ML langs we need to make an exception.

module type Semigroup = sig  type t  (* We don't use <> in the ML langs because <> is traditionally "is not equal" *)  val (+) : t -> t -> tend

Instances of semigroups are instances of the corresponding signature/protocol/type class:

module Sum : Semigroup = struct  type t = int  let (+) = Int.(+)end

Algebraic properties give us magical powers. Associativity gives the programmer and the runtime the freedom to re-associate chunks of work.

As programmers, we get to choose grouping together operations in whichever way we feel is most appropriate for the situation.

let xy = x + y inlet all = xy + z in(* or *)let all = x + y + z in()(* ... *)

On the other hand, the machine can choose to schedule this work whenever it pleases. As a consequence, semigroups can hook into many pieces of machinery in other libraries and frameworks, for example, we can use the associativity to imbue parallelism into our work for free!

(* Work to do: x + y + z + w *)let xy = x + y in (* thread1 *)let zw = z + w in (* thread2 *)xy + zw

Associativity is a very common property, so whenever you find yourself with a binary operation it's worth asking: Is this associative is this a semigroup?

Monoids

A monoid extends semigroups with an identity, $\epsilon$. So a monoid is a type, a closed binary associative operation, and an identity: $(T, \oplus, \epsilon)$. Many of the examples above for semigroups are also monoids: Addition of integers uses $0$ as an identity. Multiplication of integers' identity is $1$. We can construct an identity cache to make cache composition a monoid.

To be a valid identity, the following law must hold: $\forall x. x \oplus \epsilon = \epsilon \oplus x = x$, in other words, combining with the identity on the left or the right is the same as doing nothing at all. There is no $\epsilon$ which obeys that law that makes $(NonEmptyList, ++, \epsilon)$ a monoid. However, $(List, ++, [])$ is a monoid because concatenating the empty list on the left and right over any other list is the same as not concatenating at all.

Since it's hard to type $\epsilon$ in our programming development environments, we typically use empty to denote this operation.

module type Monoid = sig  include Semigroup  val empty : tend

An example instance:

module ListM : Monoid = struct  include ListS (* a semigroup *)  let empty = []end

The power of an identity is that there always exists a default or a known empty. Monoids let us "drop the option":

let annoyinglyNeedsOption =  if computation() then Some (x + y) else None(* to *)let expressiveNoNeedOption =  if computation() then x + y else M.empty

Monoids are the building blocks of composition. And composition leads to clean, simple, and expressive code. Moreover, when you and your colleagues can speak about these abstract notions concretely you get a huge productivity boost!

Thank you Tiziano Coroneo, @_lksz_, Kaan Dedeoglu, Avery Morin, and Hillel Wayne for pointing out some mistakes in earlier versions of this post!

1. You can also choose to consume this blog post in video form with Swift as the programming language substrate.
2. Okay, for full disclosure, I have to admit that intellectual self-indulgence also drives me to dig deep into this sort of thing. But trust me, semigroups and monoids are extremely useful!
3. The upside-down A, $\forall$, reads as "for all" -- this whole statement also reads as: For all choices of $x$ and $y$, combining $x$ and $y$ is the same as combining $y$ and $x$ a formally precise way of saying "the order that we combine doesn't matter".
4. Matrix multiplication $M_1 * M_2$ means something different than $M_2 * M_1$ see wikipedia for more.
5. Closed in this context means that the operation always returns an element of this same type, $\forall x: T, y: T. (x + y): T$, the operation never diverges with an infinite loop or throws an exception.
6. A binary operation is one which acts on two elements: $f(x,y)$ or $x \oplus y$.
7. Commutativity allows you to rearrange terms, the order of their position doesn't affect the outcome $\forall x, y. x + y = y + x$.