Random sampling: replicate v. rep

2 minute read Published:

A short post to describe repeatedly randomly generating objects.

The problem: you want to generate a random collection of letters.

letters %>% sample(size = 4) %>% paste0(collapse = "") 
## [1] "dlcz"

Great. Now do it 10 times.

letters %>% sample(size = 4) %>% paste0(collapse = "") %>% rep(8)
## [1] "rwuj" "rwuj" "rwuj" "rwuj" "rwuj" "rwuj" "rwuj" "rwuj"

😖. Why am I like this. Ok, rep copies the object 8 times. It doesn’t draw 8 samples. So just write a function:

sample_letters <- function(size = 4) {
  letters %>% sample(size = size) %>% paste0(collapse = "")
}
rep_ <- purrr::safely(rep)
rep_(sample_letters, 8) %>% .$error %>% .$message
## [1] "attempt to replicate an object of type 'closure'"

This gd thing (although I learned a bit about safely here, 👍). It wants to repeat the function definition, not repeatedly run the function. Ok.

rep(sample_letters(), 8)
## [1] "vftc" "vftc" "vftc" "vftc" "vftc" "vftc" "vftc" "vftc"

Gd-it. Same problem. It runs the function once, then repeats the object 8 times. Ok, surely if I use lapply (or sapply = simplify apply?), which is designed to run functions a million times, this will work

sapply(1:8, function(x) { sample_letters() }) 
## [1] "eyuh" "jqei" "qnpw" "owqn" "kymu" "cvmx" "vmfy" "krio"

Ugh, finally. But a pyrrhic victory. The code is ugly and horrible and requires a function argument that we throw away like plastic bottles in the ocean and those arguments will come back and kill us all.

Ok, double-checking the documentation in lapply, there’s finally one to do what I want. But of course (to Hadley’s dismay, I’m sure), the syntax is halfway between lapply and rep, which makes it impossible to remember.

replicate(8, sample_letters())
## [1] "wivo" "ctrj" "itfz" "iyac" "sbhu" "ofju" "bdpn" "idmb"

The n argument comes first like an apply, while in rep it comes second, but it’s called replicate and not rapply. On top of that, you can hack the regular lapply call to draw it randomly (as long as the size argument is in sample_letters), but now we’ve dropped the parentheses!

sapply(rep(4, 8), sample_letters)
## [1] "pwcl" "aqrg" "lmqo" "vrnh" "jfpy" "mzes" "omhe" "sfjw"

If we do that with replicate, it replicates the function definition. 😡. These are the mental steps I go through everytime I want to randomly sample something (usually just for fake data and testing).

Answer, for now

replicate(8, sample_letters())
## [1] "micu" "avot" "tobx" "myfv" "oley" "nlbz" "zsyb" "iyht"

To do: understand wtf it’s ok to repeat a function definition (like in replicate without parentheses) but not other times (like in rep without parentheses).