# Lab 10, November 21

Solutions

This lab describes a few more functions from the `stringr` package and contains additional exercises using Shakespeare’s sonnets. You should complete Lab 9 before attempting the exercises in this lab.

``````library(stringr)
library(tidyverse)
``````

## A few more `stringr` functions

### `str_replace`

`str_replace` matches a regular expression and replaces the match with another string.

``````x <- "The quick brown fox jumps over the lazy dog."
# replace the first two consecutive vowels with ??
str_replace(x, "[aeiou]{2}", "??")
``````
``````## [1] "The q??ck brown fox jumps over the lazy dog."
``````
``````# replace all spaces or periods with two dashes
str_replace_all(x, "[ .]", "--")
``````
``````## [1] "The--quick--brown--fox--jumps--over--the--lazy--dog--"
``````

### `str_c`

`str_c` combines multiple strings into a single string. The result will have the same number of elements as the longest input string.

``````y1 <- c("The", "quick", "fox", "jumps")
y2 <- c("over", "the", "lazy", "dog.")
str_c(y1, y2, sep=" ")
``````
``````## [1] "The over"   "quick the"  "fox lazy"   "jumps dog."
``````

Specifying the `collapse` argument will reduce the result to a single string (a character vector of length 1).

``````# collapse the elements of y1 into a single string
# each element is separated by a space in the result
str_c(y1, collapse=" ")
``````
``````## [1] "The quick fox jumps"
``````

### `str_split`

`str_split` splits a vector of strings into pieces delimited by a given regular expression match.

``````x
``````
``````## [1] "The quick brown fox jumps over the lazy dog."
``````
``````str_split(x, "t", simplify=TRUE)
``````
``````##      [,1]                              [,2]
## [1,] "The quick brown fox jumps over " "he lazy dog."
``````
``````str_split(x, "\\s", simplify=TRUE)
``````
``````##      [,1]  [,2]    [,3]    [,4]  [,5]    [,6]   [,7]  [,8]   [,9]
## [1,] "The" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog."
``````

## Regular expression backreferences

Wrapping parts of regular expressions in parentheses allow you to refer to those parts using backreferences.

This expression matches arbitrary-length substrings that begin and end with the same character:

``````states <- c("New Hampshire", "Connecticut", "Massachusetts", "Vermont", "Maine", "Rhode Island")
str_extract(states, "(.).*\\1")
``````
``````## [1] "ew Hampshire" "nn"           "assa"         NA
## [5] NA             "de Island"
``````

Backreferences can be used in `str_replace`:

``````# Reverse the order of the first four letters
str_replace(states, "^(.)(.)(.)(.)", "\\4\\3\\2\\1")
``````
``````## [1] " weNHampshire" "nnoCecticut"   "ssaMachusetts" "mreVont"
## [5] "niaMe"         "dohRe Island"
``````

## Exercises

We will continue manipulating the text of Shakespeare’s sonnets from Lab 9. To get started, run this code, which incorporates answers to exercises 1, 2, and 3 from Lab 9.

``````# Required setup from Lab 9
sk <- sk[str_length(sk)>0]      # Remove zero-length lines
sk <- str_trim(sk, side='left') # Remove leading whitespace
sk <-                           # Remove lines containing only a roman numeral
sk[!str_detect(sk, "^[ICDMLVX]+\$")]
``````
1. Find lines containing three-letter strings that are repeated. For example, “contented” repeats the three-letter string `nte` twice in a row.

2. Using `str_replace_all`, remove all `!`, `,`, `'`, `;`, `:`, `?` and `.` characters from `sk`. Store the result in an object called `sk2`.

3. Replace all `--` (two hyphens) in `sk2` with a single space.

4. Now combine the elements of `sk2` into a single string, called `sk2_combined`, using `str_c`. Include the argument `collapse=" "` so that the result is a single string containing all of the words in Shakespeare’s sonnets, separated by spaces.

``````str_sub(sk2_combined, 1, 300)
``````## [1] "From fairest creatures we desire increase That thereby beautys rose might never die But as the riper should by time decease His tender heir might bear his memory But thou contracted to thine own bright eyes Feedst thy lights flame with self-substantial fuel Making a famine where abundance lies Thy s"
5. Use `str_split` to split `sk2_combined` into individual words. The splitting pattern should match one or more whitespace characters (`\s`). Include the argument `simplify=TRUE`.
6. Count the number of letters in each word using `str_length`. Create the following chart, which displays the frequency of word lengths in Shakespeare’s sonnets.