Skip to contents

With strings encoded as a vector of characters, we can perform vector operations over the actual characters. All {charcuterie} functions aim to return a new object of class “chars” so it is also able to be printed as a string and passed to other vector-handling functions.

library(charcuterie)
#> 
#> Attaching package: 'charcuterie'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union

To convert a regular string into a chars object, use chars(). This prints as a string, but is actually a vector

chars("string")
#> [1] "string"

# but it's a vector
unclass(chars("string"))
#> [1] "s" "t" "r" "i" "n" "g"

Only a single string can be converted this way, so if you want to produce more than one of these, I suggest

many_chars <- lapply(c("foo", "bar", "baz"), chars)
many_chars
#> [[1]]
#> [1] "foo"
#> 
#> [[2]]
#> [1] "bar"
#> 
#> [[3]]
#> [1] "baz"
unclass(many_chars[[2]])
#> [1] "b" "a" "r"

A regular string can be recovered using string() which pastes the characters back together

string(chars("string"))
#> [1] "string"

and this can optionally take a separator

string(chars("string"), collapse = "|")
#> [1] "s|t|r|i|n|g"

Because the chars object is a vector we can do vector things, such as indexing

"string"[3] # doesn't work
#> [1] NA

chars("string")[3]
#> [1] "r"
chars("banana")[seq(2, 6, 2)]
#> [1] "aaa"

subsetting

head("string", 3) # doesn't work
#> [1] "string"

head(chars("string"), 3)
#> [1] "str"
tail(chars("string"), 3)
#> [1] "ing"

substituting

word <- chars("string")
word[3] <- "R"
word
#> [1] "stRing"

tabulating

table("mississippi") # doesn't work
#> 
#> mississippi 
#>           1

table(chars("mississippi"))
#> 
#> i m p s 
#> 4 1 2 4

sorting

sort("string") # doesn't work
#> [1] "string"

sort(chars("string"))
#> [1] "ginrst"
sort(chars("string"), decreasing = TRUE)
#> [1] "tsrnig"

reversing

rev("string") # doesn't work
#> [1] "string"

rev(chars("string"))
#> [1] "gnirts"

Since these are vectors, we no longer need nchar to determine the length

length("string") # just the one 'string'
#> [1] 1

length(chars("string")) == nchar("string")
#> [1] TRUE

Membership tests can now determine if a given character is in the ‘string’

"i" %in% "rhythm" # doesn't work
#> [1] FALSE
"y" %in% "rhythm" # doesn't work
#> [1] FALSE

"i" %in% chars("rhythm")
#> [1] FALSE
"y" %in% chars("rhythm")
#> [1] TRUE

is.element("y", "rhythm") # doesn't work
#> [1] FALSE

is.element("y", chars("rhythm"))
#> [1] TRUE

chars objects can be concatenated; combining two strings produces a longer string

c("butter", "fly") # doesn't work in the character sense
#> [1] "butter" "fly"

c(chars("butter"), chars("fly"))
#> [1] "butterfly"
c(chars("butter"), chars("fly"))[c(1, 9)]
#> [1] "by"

Set operations can be useful

setdiff(chars("javascript"), chars("script"))
#> [1] "jav"
union(chars("bunny"), chars("rabbit"))
#> [1] "bunyrait"
intersect(chars("bob"), chars("rob"))
#> [1] "bo"
setequal(chars("stop"), chars("post"))
#> [1] TRUE
setequal(chars("stop"), chars("posit"))
#> [1] FALSE
unique(chars("mississippi"))
#> [1] "misp"

Since chars objects are regular vectors, they continue to work with other vectorised operations

rev(toupper(chars("string")))
#> [1] "GNIRTS"
toString(chars("abc"))
#> [1] "a, b, c"

Filter(\(x) x != "a", "banana")
#> [1] "banana"
Filter(\(x) x != "a", chars("banana"))
#> [1] "bnn"

This last example motivates a non-set-wise way to exclude some characters, so this package introduces a new except function

except(chars("javascript"), chars("script"))
#> [1] "java"
except(chars("carpet"), chars("car"))
#> [1] "pet"
except(chars("banana"), "a")
#> [1] "bnn"
except(chars("banana"), chars("a"))
#> [1] "bnn"

Anywhere a vector of individual character works, a chars object should also work

data.frame(number = 1:3, letter = chars("abc"))
#>   number letter
#> 1      1      a
#> 2      2      b
#> 3      3      c