Lab 11, November 28, Solutions

The car-cyclist crash data set contains the approximate time when each crash occurred.

``````cr <- read_csv('cyclist_crashes.txt')
``````
``````## [1] "7:00 PM - 8:00 PM"   "10:00 AM - 11:00 AM" "3:00 PM - 4:00 PM"
## [4] "8:00 PM - 9:00 PM"   "4:00 PM - 5:00 PM"   "4:00 AM - 5:00 AM"
``````

In Lab 4 we wrote a lengthy command to convert the variable `Time.of.Day` to a numeric hour in 24-hour format.

In this exercise we will use regular expressions to extract the numeric hour from `Time.of.Day` and convert it to 24-hour time.

1. Write a function called `time24` that takes two arguments: `h12`, a vector of integers, and `pm` a logical vector. The function should convert the integers in `h12` from 12-hour time to 24-hour time. The vector `pm` indicates whether the corresponding element of `h12` is an hour that occurrs after 12 noon.

Before defining a function, enter these commands into the R console:

``````1:12 %% 12
``````
``````##  [1]  1  2  3  4  5  6  7  8  9 10 11  0
``````
``````1:12 %% 12 + 12
``````
``````##  [1] 13 14 15 16 17 18 19 20 21 22 23 12
``````

This shows that `h12 %% 12` is the desired result for all AM elements of `h12` and `h12 %% 12 + 12` is the desired result for the PM elements of `h12`.
This logic can be implemented using `ifelse`:

``````times <- c(1:12, 1:12)
pm <- c(rep(FALSE,12), rep(TRUE,12))
times %% 12 + ifelse(pm, 12, 0)
``````
``````##  [1]  1  2  3  4  5  6  7  8  9 10 11  0 13 14 15 16 17 18 19 20 21 22 23
## [24] 12
``````

Now define a function:

``````time24 <- function(h12, pm){
return(h12 %% 12 + ifelse(pm, 12, 0))
}
``````
2. What are the unique values of `cr\$Time.of.Day`? Replace ocurrences of `midnight` with `AM` and replace `noon` with `PM`.

``````unique(cr\$Time.of.Day)
``````
``````##  [1] "7:00 PM - 8:00 PM"         "10:00 AM - 11:00 AM"
##  [3] "3:00 PM - 4:00 PM"         "8:00 PM - 9:00 PM"
##  [5] "4:00 PM - 5:00 PM"         "4:00 AM - 5:00 AM"
##  [7] "11:00 AM - 12:00 noon"     "6:00 PM - 7:00 PM"
##  [9] "9:00 PM - 10:00 PM"        "2:00 PM - 3:00 PM"
## [11] "12:00 noon - 1:00 PM"      "5:00 PM - 6:00 PM"
## [13] "Unknown"                   "1:00 PM - 2:00 PM"
## [15] "3:00 AM - 4:00 AM"         "9:00 AM - 10:00 AM"
## [17] "10:00 PM - 11:00 PM"       "8:00 AM - 9:00 AM"
## [19] "6:00 AM - 7:00 AM"         "7:00 AM - 8:00 AM"
## [21] "11:00 PM - 12:00 midnight" "12:00 midnight - 1:00 AM"
## [23] "5:00 AM - 6:00 AM"         "2:00 AM - 3:00 AM"
## [25] "1:00 AM - 2:00 AM"
``````
``````cr <- cr %>%
mutate(Time.of.Day = str_replace_all(Time.of.Day, "midnight" ,"AM"),
Time.of.Day = str_replace_all(Time.of.Day, "noon", "PM"))
# Check the results:
unique(cr\$Time.of.Day)
``````
``````##  [1] "7:00 PM - 8:00 PM"   "10:00 AM - 11:00 AM" "3:00 PM - 4:00 PM"
##  [4] "8:00 PM - 9:00 PM"   "4:00 PM - 5:00 PM"   "4:00 AM - 5:00 AM"
##  [7] "11:00 AM - 12:00 PM" "6:00 PM - 7:00 PM"   "9:00 PM - 10:00 PM"
## [10] "2:00 PM - 3:00 PM"   "12:00 PM - 1:00 PM"  "5:00 PM - 6:00 PM"
## [13] "Unknown"             "1:00 PM - 2:00 PM"   "3:00 AM - 4:00 AM"
## [16] "9:00 AM - 10:00 AM"  "10:00 PM - 11:00 PM" "8:00 AM - 9:00 AM"
## [19] "6:00 AM - 7:00 AM"   "7:00 AM - 8:00 AM"   "11:00 PM - 12:00 AM"
## [22] "12:00 AM - 1:00 AM"  "5:00 AM - 6:00 AM"   "2:00 AM - 3:00 AM"
## [25] "1:00 AM - 2:00 AM"
``````
3. Use `tidyr::extract` to create two new columns: `tstart` and `tend`, containing the beginning and ending hour for the one-hour time window represented by `Time.of.Day`. Your regular expression should match the hour and the PM/AM indicator (e.g. `8:00 AM`). Include the argument `remove=FALSE`.

Note: if you receive an error like this:
“Data source must be a dictionary”
your data frame has two columns with the same name. You may have repeatedly applied `extract` and created duplicate columns.

``````tmatch <- "^(\\d{1,2}:00 [A-Za-z]+) - (\\d{1,2}:00 [A-Za-z]+)\$"
cr <-
cr %>%
tidyr::extract(Time.of.Day,
c('tstart','tend'),
tmatch, remove=FALSE)
``````

The first group of the regular expression matches everything before the space-hyphen-space separating the starting and ending times.

Check the results:

``````cr %>% select(Time.of.Day, tstart, tend) %>%
count(tstart, tend)
``````
``````## # A tibble: 25 x 3
##      tstart     tend     n
##       <chr>    <chr> <int>
##  1  1:00 AM  2:00 AM   106
##  2  1:00 PM  2:00 PM  1473
##  3 10:00 AM 11:00 AM   821
##  4 10:00 PM 11:00 PM   600
##  5 11:00 AM 12:00 PM  1140
##  6 11:00 PM 12:00 AM   323
##  7 12:00 AM  1:00 AM   203
##  8 12:00 PM  1:00 PM  1451
##  9  2:00 AM  3:00 AM    91
## 10  2:00 PM  3:00 PM  1784
## # ... with 15 more rows
``````
4. Create another two columns, `hnum` and `ampm`, containing, respectively, the numeric hour corresponding to `tstart` and the string `AM` or `PM` as appropriate. Provide the arguments `remove=FALSE` and `convert=TRUE`.
``````cr <-
cr %>%
tidyr::extract(tstart, c("hnum","ampm"),
"(\\d{1,2}).{4}([AP]M)",
remove=FALSE,
convert=TRUE)

cr %>% select(tstart, tend, hnum, ampm)
``````
``````## # A tibble: 23,809 x 4
##      tstart     tend  hnum  ampm
##  *    <chr>    <chr> <int> <chr>
##  1  7:00 PM  8:00 PM     7    PM
##  2 10:00 AM 11:00 AM    10    AM
##  3  3:00 PM  4:00 PM     3    PM
##  4  8:00 PM  9:00 PM     8    PM
##  5  4:00 PM  5:00 PM     4    PM
##  6  4:00 AM  5:00 AM     4    AM
##  7 11:00 AM 12:00 PM    11    AM
##  8 10:00 AM 11:00 AM    10    AM
##  9  8:00 PM  9:00 PM     8    PM
## 10  3:00 PM  4:00 PM     3    PM
## # ... with 23,799 more rows
``````
5. Use `mutate` to add a column called `hnum24` which contains the 24-hour time corresponding to `tstart`. (Apply your function `time24`.)

``````cr <- cr %>%
mutate(hnum24 = time24(hnum, ampm=="PM"))
cr %>% select(tstart, hnum, ampm, hnum24)
``````
``````## # A tibble: 23,809 x 4
##      tstart  hnum  ampm hnum24
##       <chr> <int> <chr>  <dbl>
##  1  7:00 PM     7    PM     19
##  2 10:00 AM    10    AM     10
##  3  3:00 PM     3    PM     15
##  4  8:00 PM     8    PM     20
##  5  4:00 PM     4    PM     16
##  6  4:00 AM     4    AM      4
##  7 11:00 AM    11    AM     11
##  8 10:00 AM    10    AM     10
##  9  8:00 PM     8    PM     20
## 10  3:00 PM     3    PM     15
## # ... with 23,799 more rows
``````