Friends Title Generator, Part 1

5 minute read Published:

This post includes a R code script to generate Friends episode titles.

To fulfill my lifelong desire to write Friends scripts, I’ll start by writing a Friends episode title. By that I mean a script that writes a Friends episode title. If you’re like me, you want to be creative, but I want to study artists and writers and distill their creativity into a recipe, and then just repeat it over and over. Isn’t that how artistic success works?

Start with episode data from the first Friends post. We rvested a bunch of stuff from the Friends IMDB.

titles <- werfriends::friends_episodes %>% as_tibble() %>%
  select(-director, -writers)
titles
## # A tibble: 236 x 5
##    season episode title                                   rating n_ratings
##     <dbl>   <dbl> <chr>                                    <dbl>     <dbl>
##  1     1.      1. The One Where Monica Gets a Roommate      8.50     4317.
##  2     1.      2. The One with the Sonogram at the End      8.20     3107.
##  3     1.      3. The One with the Thumb                    8.30     2900.
##  4     1.      4. The One with George Stephanopoulos        8.30     2810.
##  5     1.      5. The One with the East German Laundry D…   8.60     2768.
##  6     1.      6. The One with the Butt                     8.30     2695.
##  7     1.      7. The One with the Blackout                 9.00     3516.
##  8     1.      8. The One Where Nana Dies Twice             8.20     2594.
##  9     1.      9. The One Where Underdog Gets Away          8.30     2516.
## 10     1.     10. The One with the Monkey                   8.20     2544.
## # ... with 226 more rows

Check out the words

Apply the easy tidytext function unnest_tokens to process the titles, then remove the filler words by applying anti_join on the tidytext dataset stop_words.

title_words <- titles %>% 
  unnest_tokens(word, title) %>% 
  anti_join(stop_words)
## Joining, by = "word"

# replace 'rachel's' with 'rachel'
title_words %>% 
  mutate(word = gsub(pattern = "'s", replacement = "", x = word)) %>% 
  count(word, sort = TRUE)
## # A tibble: 266 x 2
##    word         n
##    <chr>    <int>
##  1 rachel      28
##  2 ross        24
##  3 joey        16
##  4 chandler    11
##  5 phoebe      10
##  6 1            9
##  7 2            9
##  8 monica       9
##  9 wedding      9
## 10 dies         6
## # ... with 256 more rows

Now we know the most important thing. Although character names don’t seem to have an effect on episode success,they’re obviously the most popular title words. On the other hand, I suppose people get married and people die pretty often in a sitcom.

# ok, now we know what's important
# the characters, weddings, people and things dying.
titles %>% filter(grepl("dies", title, ignore.case = TRUE))
## # A tibble: 6 x 5
##   season episode title                           rating n_ratings
##    <dbl>   <dbl> <chr>                            <dbl>     <dbl>
## 1     1.      8. The One Where Nana Dies Twice     8.20     2594.
## 2     2.      3. The One Where Heckles Dies        8.40     2316.
## 3     2.     18. The One Where Dr. Ramoray Dies    8.50     2165.
## 4     2.     20. The One Where Old Yeller Dies     8.30     2084.
## 5     7.     13. The One Where Rosita Dies         8.50     1842.
## 6    10.     15. The One Where Estelle Dies        8.60     1783.

Dang, ok. 3 old people, old yeller, Joey’s DOOL character, and JOEY’S CHAIR ROSITA.

A sentence

A sentence is a sequence of words. Given a word, e.g., “One”, there is a probability that the next word in the sequence is “Where” (pretty high actually), and slightly different probabilities for all the other words (“With” is also very high). Thinking this way, we think of the sentence generator as a network. Each vertex is a word in the set of titles. A directed edge exists if the word pair (word, next) exists in the set of titles. The edge weight is the frequency of times next is the word that follows word in all the titles.

word_transitions <- titles %>% 
  unnest_tokens(word, title) %>% 
  group_by(season, episode) %>% 
  filter(row_number() > 1) %>% 
  mutate(nxt = lead(word), 
         nxt = ifelse(is.na(nxt), "EOL", nxt)) %>% 
  group_by(word, nxt) %>% 
  count() %>% 
  group_by(word) %>% 
  mutate(weight = n / sum(n))

I find dplyr makes it super easy to get a solution right away; here, you can almost code as fast as you think (take titles, convert it to words, drop the first word, make a new variable representing the ‘next word’, calculate the frequency the next word appears given each current word). On the other hand, it’s a bit difficult to follow after the fact, maybe because you write it so quickly that you forget to document it. Also the constant group_by and ungrouping. 😑.

Now make me a title, R

So, now we have a tidy matrix of word transition probabilities. All we have to do is start with the words “the one”, and calculate the next word in the sequence (by using the great sample_n to pick the next word in the title). Then keep going until you hit the end of the sentence (‘EOL’).

generate_title <- function() {
  new_title <- c("the", "one")
  while(new_title[length(new_title)] != "EOL") {
    new_title <- c(new_title, 
                   word_transitions %>% 
                     filter(word == new_title[length(new_title)]) %>% 
                     sample_n(size = 1, weight = weight) %>% 
                     pull(nxt))
  }
  new_title[-length(new_title)] %>% paste(collapse = " ") %>% stringr::str_to_title()
}
generate_title()
## [1] "The One With The Halloween Party"

Yeaaa! Here’s a few more:

replicate(5, generate_title())
## [1] "The One With Monica Sings"        "The One With All Night"          
## [3] "The One Where They're Going Away" "The One With The Tea Leaves"     
## [5] "The One Where Ross Grant"

Problem: since this is based on only 236 titles, it tends to generate titles that already exist. E.g., there is only one time the word ‘George’ shows up in a title, so anytime the title generator hits ‘The One With George’, the next word will always be ‘Stephanopolous’, and then it will always end.

We can try to fix this in the next installment, by using the tidytext::parts_of_speech dataset to generate sentence structures and then populate them with words, e.g., "the one where [noun] [verb]".

Anyway, here’s a title that’s randomly generated every time Netlify rebuilds this post:

The One In A Bath