‘Monad’ is a term often best avoided in conversation, and is often described in overly mathematical terms, the “meme” definition being the category theory version which states
“a monad is just a monoid in the category of endofunctors”
which is mostly true, but also unnecessary.
This blog post does a great job of walking through the more practical definition, and it has “translations” into several programming languages including JavaScript and Python.
Basically, map
applies some function to some values.
flatMap
does the same, but first “reaches inside” a context
to extract some inner values, and after applying the function, re-wraps
the result in the original context.
The enlightening example for me is a List
- if we have
some values and want to apply some function to them, we can do that
with, e.g.
f <- function(x) x^2
Map(f, c(2, 4, 6))
#> [[1]]
#> [1] 4
#>
#> [[2]]
#> [1] 16
#>
#> [[3]]
#> [1] 36
and if we have a ‘flat’ list, this still works
but what if we have an ‘outer context’ list?
In this case, because f
is vectorised, Map
sends each vector to f
and gets a result for each list.
What if we have a list in the inner context?
This fails because f(list(2, 3))
fails (it doesn’t know
how to deal with an argument which is a list).
Instead, we can use a version of ‘map’ that first reaches inside the
outer list
context, concatenates what’s inside, applies the
function, then re-wraps the result in a new, flat list
fmap <- function(x, f) {
list(f(unlist(x)))
}
fmap(list(list(2, 3), list(4, 5, 6)), f)
#> [[1]]
#> [1] 4 9 16 25 36
This is the essence of a monad - something that supports such a
fmap
operation that performs the mapping inside the
context. There are various patterns which benefit from such a context,
and this vignette describes an implementation of several of these via
the {monads} package.
The fmap
operation is so common that it’s typical to
find it presented as an infix function, similar to how pipes work in
R
and we can go one step further by defining a new pipe which is just a different syntax for this
x |> fmap(f)
x %>>=% f
This infix function borrows from Haskell’s >>=
(pronounced “bind”) which is so fundamental that forms part of the
language’s logo
Additionally, some toy helper functions are defined in this package for demonstrating application of functions
List
As per the example above, the List
monad wraps values
(which may be additional list
s) and when
flatMap
ed the results are ‘flattened’ into a single
List
.
# identical to a regular Map
x <- listM(1, 2, 3) %>>=%
timestwo()
x
#> [[1]]
#> [1] 2 4 6
# only possible with the flatMap approach
y <- listM(list(1, 2), list(3, 4, 5)) %>>=%
timestwo()
y
#> [[1]]
#> [1] 2 4 6 8 10
Note that while x
and y
print as regular
lists, they remain List
monads; a print
method
is defined which essentially extracts value(x)
.
Logger
A context could include a stored ‘log’ of the expressions used on
each application. One example would be performing data transformation in
a {dplyr} pipeline (which would usually use %>%
or more
recently |>
).
All that is required is to wrap the value at the start of the
pipeline in a Logger
context, which is achieved by calling
the new()
method of Logger
for which there is
a constructor helper, loggerM()
library(dplyr, warn.conflicts = FALSE)
result <- loggerM(mtcars) %>>=%
filter(mpg > 10) %>>=%
select(mpg, cyl, disp) %>>=%
arrange(desc(mpg)) %>>=%
head()
This result is still a Logger
instance, not a value. To
extract the value from this we can use value()
. To extract
the log of each step, use logger_log()
(to avoid conflict
with base::log
)
value(result)
#> mpg cyl disp
#> Toyota Corolla 33.9 4 71.1
#> Fiat 128 32.4 4 78.7
#> Honda Civic 30.4 4 75.7
#> Lotus Europa 30.4 4 95.1
#> Fiat X1-9 27.3 4 79.0
#> Porsche 914-2 26.0 4 120.3
logger_log(result)
#> ✔ Log of 4 operations:
#>
#> mtcars %>%
#> filter(mpg > 10) %>%
#> select(mpg, cyl, disp) %>%
#> arrange(desc(mpg)) %>%
#> head()
This works with any data value, so we could just as easily use an in-memory SQLite database (or external)
mem <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dplyr::copy_to(mem, mtcars)
res <- loggerM(mem) %>>=%
tbl("mtcars") %>>=%
filter(mpg > 10) %>>=%
select(mpg, cyl, disp) %>>=%
arrange(desc(mpg)) %>>=%
head()
Again, extracting the components from this
value(res)
#> # Source: SQL [6 x 3]
#> # Database: sqlite 3.46.0 [:memory:]
#> # Ordered by: desc(mpg)
#> mpg cyl disp
#> <dbl> <dbl> <dbl>
#> 1 33.9 4 71.1
#> 2 32.4 4 78.7
#> 3 30.4 4 75.7
#> 4 30.4 4 95.1
#> 5 27.3 4 79
#> 6 26 4 120.
logger_log(res)
#> ✔ Log of 5 operations:
#>
#> mem %>%
#> tbl("mtcars") %>%
#> filter(mpg > 10) %>%
#> select(mpg, cyl, disp) %>%
#> arrange(desc(mpg)) %>%
#> head()
Since the log captures what operations were performed, we could re-run this expression, and a helper is available for that
rerun(res)
#> # Source: SQL [6 x 3]
#> # Database: sqlite 3.46.0 [:memory:]
#> # Ordered by: desc(mpg)
#> mpg cyl disp
#> <dbl> <dbl> <dbl>
#> 1 33.9 4 71.1
#> 2 32.4 4 78.7
#> 3 30.4 4 75.7
#> 4 30.4 4 95.1
#> 5 27.3 4 79
#> 6 26 4 120.
Some similar functionality is present in the {magrittr} package which
provides the ‘classic’ R pipe %>%
; a ‘functional
sequence’ starts with a .
and similarly tracks which
functions are to be applied to an arbitrary input once evaluated - in
this way, this is similar to defining a new function.
library(magrittr)
# define a functional sequence
fs <- . %>%
tbl("mtcars") %>%
select(cyl, mpg)
# evaluate the functional sequence with some input data
fs(mem)
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.46.0 [:memory:]
#> cyl mpg
#> <dbl> <dbl>
#> 1 6 21
#> 2 6 21
#> 3 4 22.8
#> 4 6 21.4
#> 5 8 18.7
#> 6 6 18.1
#> 7 8 14.3
#> 8 4 24.4
#> 9 4 22.8
#> 10 6 19.2
#> # ℹ more rows
# identify the function calls at each step of the pipeline
magrittr::functions(fs)
#> [[1]]
#> function (.)
#> tbl(., "mtcars")
#>
#> [[2]]
#> function (.)
#> select(., cyl, mpg)
Since the functional sequence is unevaluated, errors can be present and not triggered
errfs <- . %>%
sqrt() %>%
stop("oops") %>%
add_n(3)
x <- 1:10
errfs(x)
#> Error in function_list[[i]](value): 11.41421356237311.7320508075688822.236067977499792.449489742783182.645751311064592.8284271247461933.16227766016838oops
magrittr::functions(errfs)
#> [[1]]
#> function (.)
#> sqrt(.)
#>
#> [[2]]
#> function (.)
#> stop(., "oops")
#>
#> [[3]]
#> function (.)
#> add_n(., 3)
In the monad context, steps which do raise an error nullify the value and a signifier is added to the log to prevent re-running the error
resx <- loggerM(x) %>>=%
sqrt() %>>=%
add_n(4)
value(resx)
#> [1] 5.000000 5.414214 5.732051 6.000000 6.236068 6.449490 6.645751 6.828427
#> [9] 7.000000 7.162278
logger_log(resx)
#> ✔ Log of 2 operations:
#>
#> x %>%
#> sqrt() %>%
#> add_n(4)
err <- loggerM(x) %>>=%
sqrt() %>>=%
stop("oops") %>>=%
add_n(3)
value(err)
#> NULL
logger_log(err)
#> ✖ Log of 3 operations: [ERROR]
#>
#> x %>%
#> sqrt() %>%
#> [E] stop("oops") %>%
#> [E] add_n(3)
Aside from an error destroying the value, returning a
NULL
result will also produce this effect
nullify <- loggerM(x) %>>=%
sqrt() %>>=%
ret_null() %>>=%
add_n(7)
value(nullify)
#> NULL
logger_log(nullify)
#> ✖ Log of 3 operations: [ERROR]
#>
#> x %>%
#> sqrt() %>%
#> [E] ret_null() %>%
#> [E] add_n(7)
One downside to the functional sequence approach is chaining these -
since the first term must be .
, that is always the first
entry, and chaining multiple sequences is not clean.
a <- . %>% sqrt()
a
#> Functional sequence with the following components:
#>
#> 1. sqrt(.)
#>
#> Use 'functions' to extract the individual functions.
b <- . %>% a %>% add_n(1)
b
#> Functional sequence with the following components:
#>
#> 1. a(.)
#> 2. add_n(., 1)
#>
#> Use 'functions' to extract the individual functions.
b(x)
#> [1] 2.000000 2.414214 2.732051 3.000000 3.236068 3.449490 3.645751 3.828427
#> [9] 4.000000 4.162278
Because the monad context is recreated at every step, chaining these is not a problem
a <- loggerM(x) %>>=%
sqrt()
value(a)
#> [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
#> [9] 3.000000 3.162278
logger_log(a)
#> ✔ Log of 1 operations:
#>
#> x %>%
#> sqrt()
b <- a %>>=%
add_n(1)
value(b)
#> [1] 2.000000 2.414214 2.732051 3.000000 3.236068 3.449490 3.645751 3.828427
#> [9] 4.000000 4.162278
logger_log(b)
#> ✔ Log of 2 operations:
#>
#> x %>%
#> sqrt() %>%
#> add_n(1)
Timer
In addition to capturing the expressions in a log, the
Timer
monad also captures the evaluation timing for each
step, storing these alongside the expressions themselves in a
data.frame
x <- timerM(5) %>>=%
sleep_for(3) %>>=%
timestwo() %>>=%
sleep_for(1.3)
value(x)
#> [1] 10
times(x)
#> expr time
#> 1 5 0.000
#> 2 sleep_for(3) 3.003
#> 3 timestwo() 0.000
#> 4 sleep_for(1.3) 1.301
y <- timerM(5) %>>=%
sleep_for(2) %>>=%
ret_null() %>>=%
sleep_for(0.3)
value(y)
#> NULL
times(y)
#> expr time
#> 1 5 0.000
#> 2 sleep_for(2) 2.003
#> 3 ret_null() 0.000
#> 4 sleep_for(0.3) 0.300
Maybe
In some languages it is preferrable to return something
rather than raising an error, particularly if you want to ensure that
errors are handled. The Maybe
pattern consists of either a
Nothing
(which is empty) or a Just
containing
some value; all functions applied to a Maybe
will be one of
these.
For testing the result, some helpers is_nothing()
and
is_just()
are defined.
x <- maybeM(9) %>>=%
sqrt() %>>=%
timestwo()
value(x)
#> Just:
#> [1] 6
is_just(x)
#> [1] TRUE
is_nothing(x)
#> [1] FALSE
y <- maybeM(Nothing()) %>>=%
sqrt()
value(y)
#> Nothing
is_just(y)
#> [1] FALSE
is_nothing(y)
#> [1] TRUE
z <- maybeM(10) %>>=%
timestwo() %>>=%
add_n(Nothing())
value(z)
#> Nothing
is_just(z)
#> [1] FALSE
is_nothing(z)
#> [1] TRUE
For what is likely a much more robust implementation, see {maybe}.
Result
Similar to a Maybe
, a Result
can contain
either a successful Ok
wrapped value or an Err
wrapped message, but it will be one of these. This pattern resembles
(and internally, uses) the tryCatch()
approach where the
evaluation will not fail, but requires testing what is produced to
determine success, for which is_ok()
and
is_err()
are defined.
x <- resultM(9) %>>=%
sqrt() %>>=%
timestwo()
value(x)
#> OK:
#> [1] 6
is_err(x)
#> [1] FALSE
is_ok(x)
#> [1] TRUE
When the evaluation fails, the error is reported, along with the value prior to the error
y <- resultM(9) %>>=%
sqrt() %>>=%
ret_err("this threw an error")
value(y)
#> Error:
#> [1] "this threw an error; previously: 3"
is_err(y)
#> [1] TRUE
is_ok(y)
#> [1] FALSE
z <- resultM(10) %>>=%
timestwo() %>>=%
add_n("banana")
value(z)
#> Error:
#> [1] "n should be numeric; previously: 20"
is_err(z)
#> [1] TRUE
is_ok(z)
#> [1] FALSE
Extensions
The flatMap
/“bind” operator defined here as
%>>=%
is applicable to any monad which has a
bind()
method defined. The monads defined in this package
are all R6Class
objects exposing such a method of the form
m$bind(.call, .quo)
which expects a function and a quosure.
You can add your own extensions to these by defining such a class (and
probably a constructor helper and a print()
method)
# a Reporter monad which reports unpiped function calls
Reporter <- R6::R6Class(
c("ReporterMonad"),
public = list(
value = NULL,
initialize = function(value) {
if (rlang::is_quosure(value)) {
self$value <- rlang::eval_tidy(value)
} else {
self$value <- value
}
},
bind = function(f, expr) {
## 'undo' the pipe and inject the lhs as an argument
result <- unlist(lapply(unlist(self$value), f))
args <- as.list(c(self$value, rlang::call_args(expr)))
fnew <- rlang::call2(rlang::call_name(expr), !!!args)
cat(" ** Calculating:", rlang::quo_text(fnew), "=", result, "\n")
Reporter$new(result)
}
)
)
reporterM <- function(value) {
v <- rlang::enquo(value)
Reporter$new(v)
}
print.Reporter <- function(x, ...) {
print(value(x))
}
x <- reporterM(17) %>>=%
timestwo() %>>=%
square() %>>=%
add_n(2) %>>=%
`/`(8)
#> ** Calculating: timestwo(17) = 34
#> ** Calculating: square(34) = 1156
#> ** Calculating: add_n(1156, 2) = 1158
#> ** Calculating: 1158/8 = 144.75
value(x)
#> [1] 144.75
This is just a toy example; attempting to cat()
a
data.frame
result would not go well.