Monads in R • monads

‘Monad’ is a term often best avoided in conversation, and is often described in overly mathematical terms, the “meme” definition being the category theory version which states

“a monad is just a monoid in the category of endofunctors”

which is mostly true, but also unnecessary.

This blog post does a great job of walking through the more practical definition, and it has “translations” into several programming languages including JavaScript and Python.

Basically, map applies some function to some values. flatMap does the same, but first “reaches inside” a context to extract some inner values, and after applying the function, re-wraps the result in the original context.

The enlightening example for me is a List - if we have some values and want to apply some function to them, we can do that with, e.g.

f <- function(x) x^2
Map(f, c(2, 4, 6))
#> [[1]]
#> [1] 4
#> 
#> [[2]]
#> [1] 16
#> 
#> [[3]]
#> [1] 36

and if we have a ‘flat’ list, this still works

Map(f, list(2, 4, 6))
#> [[1]]
#> [1] 4
#> 
#> [[2]]
#> [1] 16
#> 
#> [[3]]
#> [1] 36

but what if we have an ‘outer context’ list?

Map(f, list(c(2, 3), c(4, 5, 6)))
#> [[1]]
#> [1] 4 9
#> 
#> [[2]]
#> [1] 16 25 36

In this case, because f is vectorised, Map sends each vector to f and gets a result for each list. What if we have a list in the inner context?

Map(f, list(list(2, 3), list(4, 5, 6)))
#> Error in x^2: non-numeric argument to binary operator

This fails because f(list(2, 3)) fails (it doesn’t know how to deal with an argument which is a list).

Instead, we can use a version of ‘map’ that first reaches inside the outer list context, concatenates what’s inside, applies the function, then re-wraps the result in a new, flat list

fmap <- function(x, f) {
  list(f(unlist(x)))
}
fmap(list(list(2, 3), list(4, 5, 6)), f)
#> [[1]]
#> [1]  4  9 16 25 36

This is the essence of a monad - something that supports such a fmap operation that performs the mapping inside the context. There are various patterns which benefit from such a context, and this vignette describes an implementation of several of these via the {monads} package.

The fmap operation is so common that it’s typical to find it presented as an infix function, similar to how pipes work in R

list(list(2, 3), list(4, 5, 6)) |> fmap(f)
#> [[1]]
#> [1]  4  9 16 25 36

and we can go one step further by defining a new pipe which is just a different syntax for this

x |> fmap(f)

x %>>=% f

This infix function borrows from Haskell’s >>= (pronounced “bind”) which is so fundamental that forms part of the language’s logo

library(monads)

Additionally, some toy helper functions are defined in this package for demonstrating application of functions

timestwo(4)
#> [1] 8
square(5)
#> [1] 25
add_n(3, 4)
#> [1] 7

List

As per the example above, the List monad wraps values (which may be additional lists) and when flatMaped the results are ‘flattened’ into a single List.

# identical to a regular Map
x <- listM(1, 2, 3) %>>=%
  timestwo()
x
#> [[1]]
#> [1] 2 4 6

# only possible with the flatMap approach
y <- listM(list(1, 2), list(3, 4, 5)) %>>=% 
  timestwo()
y
#> [[1]]
#> [1]  2  4  6  8 10

Note that while x and y print as regular lists, they remain List monads; a print method is defined which essentially extracts value(x).

Logger

A context could include a stored ‘log’ of the expressions used on each application. One example would be performing data transformation in a {dplyr} pipeline (which would usually use %>% or more recently |>).

All that is required is to wrap the value at the start of the pipeline in a Logger context, which is achieved by calling the new() method of Logger for which there is a constructor helper, loggerM()

library(dplyr, warn.conflicts = FALSE)

result <- loggerM(mtcars) %>>=%
  filter(mpg > 10) %>>=%
  select(mpg, cyl, disp) %>>=%
  arrange(desc(mpg)) %>>=%
  head()

This result is still a Logger instance, not a value. To extract the value from this we can use value(). To extract the log of each step, use logger_log() (to avoid conflict with base::log)

value(result)
#>                 mpg cyl  disp
#> Toyota Corolla 33.9   4  71.1
#> Fiat 128       32.4   4  78.7
#> Honda Civic    30.4   4  75.7
#> Lotus Europa   30.4   4  95.1
#> Fiat X1-9      27.3   4  79.0
#> Porsche 914-2  26.0   4 120.3
logger_log(result)
#> ✔ Log of 4 operations:
#> 
#>  mtcars %>%
#>    filter(mpg > 10) %>%
#>    select(mpg, cyl, disp) %>%
#>    arrange(desc(mpg)) %>%
#>    head()

This works with any data value, so we could just as easily use an in-memory SQLite database (or external)

mem <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dplyr::copy_to(mem, mtcars)

res <- loggerM(mem) %>>=%
  tbl("mtcars") %>>=%
  filter(mpg > 10) %>>=%
  select(mpg, cyl, disp) %>>=%
  arrange(desc(mpg)) %>>=%
  head()

Again, extracting the components from this

value(res)
#> # Source:     SQL [6 x 3]
#> # Database:   sqlite 3.46.0 [:memory:]
#> # Ordered by: desc(mpg)
#>     mpg   cyl  disp
#>   <dbl> <dbl> <dbl>
#> 1  33.9     4  71.1
#> 2  32.4     4  78.7
#> 3  30.4     4  75.7
#> 4  30.4     4  95.1
#> 5  27.3     4  79  
#> 6  26       4 120.
logger_log(res)
#> ✔ Log of 5 operations:
#> 
#>  mem %>%
#>    tbl("mtcars") %>%
#>    filter(mpg > 10) %>%
#>    select(mpg, cyl, disp) %>%
#>    arrange(desc(mpg)) %>%
#>    head()

Since the log captures what operations were performed, we could re-run this expression, and a helper is available for that

rerun(res)
#> # Source:     SQL [6 x 3]
#> # Database:   sqlite 3.46.0 [:memory:]
#> # Ordered by: desc(mpg)
#>     mpg   cyl  disp
#>   <dbl> <dbl> <dbl>
#> 1  33.9     4  71.1
#> 2  32.4     4  78.7
#> 3  30.4     4  75.7
#> 4  30.4     4  95.1
#> 5  27.3     4  79  
#> 6  26       4 120.

Some similar functionality is present in the {magrittr} package which provides the ‘classic’ R pipe %>%; a ‘functional sequence’ starts with a . and similarly tracks which functions are to be applied to an arbitrary input once evaluated - in this way, this is similar to defining a new function.

library(magrittr)

# define a functional sequence
fs <- . %>%
  tbl("mtcars") %>%
  select(cyl, mpg)

# evaluate the functional sequence with some input data
fs(mem)
#> # Source:   SQL [?? x 2]
#> # Database: sqlite 3.46.0 [:memory:]
#>      cyl   mpg
#>    <dbl> <dbl>
#>  1     6  21  
#>  2     6  21  
#>  3     4  22.8
#>  4     6  21.4
#>  5     8  18.7
#>  6     6  18.1
#>  7     8  14.3
#>  8     4  24.4
#>  9     4  22.8
#> 10     6  19.2
#> # ℹ more rows

# identify the function calls at each step of the pipeline
magrittr::functions(fs)
#> [[1]]
#> function (.) 
#> tbl(., "mtcars")
#> 
#> [[2]]
#> function (.) 
#> select(., cyl, mpg)

Since the functional sequence is unevaluated, errors can be present and not triggered

errfs <- . %>%
  sqrt() %>%
  stop("oops") %>%
  add_n(3)

x <- 1:10

errfs(x)
#> Error in function_list[[i]](value): 11.41421356237311.7320508075688822.236067977499792.449489742783182.645751311064592.8284271247461933.16227766016838oops
magrittr::functions(errfs)
#> [[1]]
#> function (.) 
#> sqrt(.)
#> 
#> [[2]]
#> function (.) 
#> stop(., "oops")
#> 
#> [[3]]
#> function (.) 
#> add_n(., 3)

In the monad context, steps which do raise an error nullify the value and a signifier is added to the log to prevent re-running the error

resx <- loggerM(x) %>>=%
  sqrt() %>>=%
  add_n(4)

value(resx)
#>  [1] 5.000000 5.414214 5.732051 6.000000 6.236068 6.449490 6.645751 6.828427
#>  [9] 7.000000 7.162278
logger_log(resx)
#> ✔ Log of 2 operations:
#> 
#>  x %>%
#>    sqrt() %>%
#>    add_n(4)

err <- loggerM(x) %>>=%
  sqrt() %>>=%
  stop("oops") %>>=%
  add_n(3)

value(err)
#> NULL
logger_log(err)
#> ✖ Log of 3 operations: [ERROR]
#> 
#>  x %>%
#>    sqrt() %>%
#>    [E] stop("oops") %>%
#>    [E] add_n(3)

Aside from an error destroying the value, returning a NULL result will also produce this effect

nullify <- loggerM(x) %>>=%
  sqrt() %>>=%
  ret_null() %>>=%
  add_n(7)

value(nullify)
#> NULL
logger_log(nullify)
#> ✖ Log of 3 operations: [ERROR]
#> 
#>  x %>%
#>    sqrt() %>%
#>    [E] ret_null() %>%
#>    [E] add_n(7)

One downside to the functional sequence approach is chaining these - since the first term must be ., that is always the first entry, and chaining multiple sequences is not clean.

a <- . %>% sqrt()
a
#> Functional sequence with the following components:
#> 
#>  1. sqrt(.)
#> 
#> Use 'functions' to extract the individual functions.

b <- . %>% a %>% add_n(1)
b
#> Functional sequence with the following components:
#> 
#>  1. a(.)
#>  2. add_n(., 1)
#> 
#> Use 'functions' to extract the individual functions.

b(x)
#>  [1] 2.000000 2.414214 2.732051 3.000000 3.236068 3.449490 3.645751 3.828427
#>  [9] 4.000000 4.162278

Because the monad context is recreated at every step, chaining these is not a problem

a <- loggerM(x) %>>=%
  sqrt()

value(a)
#>  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
#>  [9] 3.000000 3.162278
logger_log(a)
#> ✔ Log of 1 operations:
#> 
#>  x %>%
#>    sqrt()

b <- a %>>=%
  add_n(1)

value(b)
#>  [1] 2.000000 2.414214 2.732051 3.000000 3.236068 3.449490 3.645751 3.828427
#>  [9] 4.000000 4.162278
logger_log(b)
#> ✔ Log of 2 operations:
#> 
#>  x %>%
#>    sqrt() %>%
#>    add_n(1)

Timer

In addition to capturing the expressions in a log, the Timer monad also captures the evaluation timing for each step, storing these alongside the expressions themselves in a data.frame

x <- timerM(5) %>>=%
  sleep_for(3) %>>=%
  timestwo() %>>=%
  sleep_for(1.3)

value(x)
#> [1] 10
times(x)
#>             expr  time
#> 1              5 0.000
#> 2   sleep_for(3) 3.003
#> 3     timestwo() 0.000
#> 4 sleep_for(1.3) 1.301

y <- timerM(5) %>>=%
  sleep_for(2) %>>=%
  ret_null() %>>=%
  sleep_for(0.3)

value(y)
#> NULL
times(y)
#>             expr  time
#> 1              5 0.000
#> 2   sleep_for(2) 2.003
#> 3     ret_null() 0.000
#> 4 sleep_for(0.3) 0.300

Maybe

In some languages it is preferrable to return something rather than raising an error, particularly if you want to ensure that errors are handled. The Maybe pattern consists of either a Nothing (which is empty) or a Just containing some value; all functions applied to a Maybe will be one of these.

For testing the result, some helpers is_nothing() and is_just() are defined.

x <- maybeM(9) %>>=% 
  sqrt() %>>=%
  timestwo()

value(x)
#> Just:
#> [1] 6
is_just(x)
#> [1] TRUE
is_nothing(x)
#> [1] FALSE

y <- maybeM(Nothing()) %>>=%
  sqrt()

value(y)
#> Nothing
is_just(y)
#> [1] FALSE
is_nothing(y)
#> [1] TRUE

z <- maybeM(10) %>>=%
  timestwo() %>>=%
  add_n(Nothing())

value(z)
#> Nothing
is_just(z)
#> [1] FALSE
is_nothing(z)
#> [1] TRUE

For what is likely a much more robust implementation, see {maybe}.

Result

Similar to a Maybe, a Result can contain either a successful Ok wrapped value or an Err wrapped message, but it will be one of these. This pattern resembles (and internally, uses) the tryCatch() approach where the evaluation will not fail, but requires testing what is produced to determine success, for which is_ok() and is_err() are defined.

x <- resultM(9) %>>=% 
  sqrt() %>>=%
  timestwo()

value(x)
#> OK:
#> [1] 6
is_err(x)
#> [1] FALSE
is_ok(x)
#> [1] TRUE

When the evaluation fails, the error is reported, along with the value prior to the error

y <- resultM(9) %>>=%
  sqrt() %>>=%
  ret_err("this threw an error")

value(y)
#> Error:
#> [1] "this threw an error; previously: 3"
is_err(y)
#> [1] TRUE
is_ok(y)
#> [1] FALSE

z <- resultM(10) %>>=%
  timestwo() %>>=%
  add_n("banana")

value(z)
#> Error:
#> [1] "n should be numeric; previously: 20"
is_err(z)
#> [1] TRUE
is_ok(z)
#> [1] FALSE

Extensions

The flatMap/“bind” operator defined here as %>>=% is applicable to any monad which has a bind() method defined. The monads defined in this package are all R6Class objects exposing such a method of the form m$bind(.call, .quo) which expects a function and a quosure. You can add your own extensions to these by defining such a class (and probably a constructor helper and a print() method)

# a Reporter monad which reports unpiped function calls
Reporter <- R6::R6Class(
  c("ReporterMonad"),
  public = list(
    value = NULL,
    initialize = function(value) {
      if (rlang::is_quosure(value)) {
        self$value <- rlang::eval_tidy(value)
      } else {
        self$value <- value
      }
    },
    bind = function(f, expr) {
      ## 'undo' the pipe and inject the lhs as an argument
      result <- unlist(lapply(unlist(self$value), f))
      args <- as.list(c(self$value, rlang::call_args(expr)))
      fnew <- rlang::call2(rlang::call_name(expr), !!!args)
      cat(" ** Calculating:", rlang::quo_text(fnew), "=", result, "\n")
      Reporter$new(result)
    }
  )
)

reporterM <- function(value) {
  v <- rlang::enquo(value)
  Reporter$new(v)
}

print.Reporter <- function(x, ...) {
  print(value(x))
}

x <- reporterM(17) %>>=%
  timestwo() %>>=%
  square() %>>=% 
  add_n(2) %>>=%
  `/`(8)
#>  ** Calculating: timestwo(17) = 34 
#>  ** Calculating: square(34) = 1156 
#>  ** Calculating: add_n(1156, 2) = 1158 
#>  ** Calculating: 1158/8 = 144.75

value(x)
#> [1] 144.75

This is just a toy example; attempting to cat() a data.frame result would not go well.