In this computing club mini session, we will cover the lubridate package and learn how to better work with dates and times in R. Lubridate was developed by Garrett Grolemund and Hadley Wickham, and is maintained by Vitalie Spinu. Oftentimes, investigators will provide time/date data in raw form, making it difficult to work with these variables. Conversions to the desirable form are tricky, and time-consuming. The functions in the lubridate package help to streamline and facilitate this process. Lubridate is not part of the tidyverse core (only need it when working with dates/times).
Three possible date/time formats
Date
tibbles print this as <date>
Time
tibbles print this as <time>
Date-time
instant in time, tibbles print this as <dttm>
(also called POSIXct in R)
## [1] "2019-01-16"
## [1] "2019-01-16"
## [1] "2019-07-01"
## [1] "2019-07-01 17:27:30 EDT"
## [1] 1
## [1] Wed
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# Change year of example date, and get new day of the week
year(exdate) <- 2016
wday(exdate, label = T)
## [1] Sat
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
Let’s use the nycflights13
data
Three scenarios of creating a date/time variable:
Must specify correct input and parsing will convert to standard date format
## [1] "2018-04-08"
## [1] "2018-04-08"
## [1] "2018-04-08"
## [1] "2018-04-08"
## [1] "2017-01-31 20:11:59 UTC"
## [1] "2017-01-31 08:01:00 UTC"
## [1] "2017-01-31 UTC"
# Create new variable that combines the five columns using make_datetime
flights %>%
select(year, month, day, hour, minute) %>%
mutate(departure = make_datetime(year, month, day, hour, minute))
# Combine and create variables for departure, arrival, scheduled departure, scheduled arrival
make_datetime_100 <- function(year, month, day, time) {
make_datetime(year, month, day, time %/% 100, time %% 100)
}
depart <- flights %>%
filter(!is.na(dep_time), !is.na(arr_time)) %>%
mutate(
dep_time = make_datetime_100(year, month, day, dep_time),
arr_time = make_datetime_100(year, month, day, arr_time),
sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)
) %>%
select(origin, dest, ends_with("delay"), ends_with("time"))
depart
Switch between a date-time and a date --> as_datetime() and as_date()
## [1] "2019-07-01 UTC"
## [1] "2019-07-01"
Date/times as numeric offsets from Unix Epoch 1970-01-01
## [1] "1970-01-01 10:00:00 UTC"
## [1] "1980-01-01"
Lets revist the wday() function and the nycflights13 dataset
And the minute() function
Takes a date-time object and rounds it down to the nearest boundary of the specified time unit
Takes a date-time object and rounds it to the nearest value of the specified time unit. Exactly halfway --> round up
Takes a date-time object and rounds it up to the nearest boundary of the specified time unit
Set multiple values at once
datetime <- ymd_hms("2019-06-20 11:44:39")
update(datetime, year = 2018, month = 8, mday = 10, hour = 5)
## [1] "2018-08-10 05:44:39 UTC"
durations:
measure the exact amount of time between two points
periods:
track clock times despite leap years, leap seconds, and day light savings time
intervals:
protean summary of the time information between two points
#Subtraction of two dates --> difftime object (seconds, minutes, hours, days, or weeks)
vicky_age <- today() - ymd(19940819)
vicky_age
## Time difference of 9082 days
Let’s convert vicky_age
to a duration using the lubridate
package
## [1] "784684800s (~24.87 years)"
We can use the built in features of associated functions to extract the relevant information that we need:
## [1] "25s"
## [1] "3000s (~50 minutes)"
## [1] "82800s (~23 hours)"
## [1] "864000s (~1.43 weeks)"
## [1] "20563200s (~34 weeks)"
## [1] "378432000s (~11.99 years)"
Durations: give time span in seconds
Adding, subtracting, and multiplying
## [1] "1594980780s (~50.54 years)"
## [1] "1576800000s (~49.97 years)"
## [1] "2019-07-02"
## [1] "2018-07-01"
Motivation
## [1] "2017-02-10 14:00:00 EST"
## [1] "2017-03-22 15:00:00 EDT"
Like time spans, but without fixed length in seconds
## [1] "25S"
## [1] "45M 0S"
## [1] "16H 0M 0S"
## [1] "40d 0H 0M 0S"
## [1] "7m 0d 0H 0M 0S"
## [1] "161d 0H 0M 0S"
## [1] "30y 0m 0d 0H 0M 0S"
Adding, subtracting, and multiplying
## [1] "200m 0d 0H 0M 0S"
## [1] "30d 4H 10M 0S"
## [1] "2016-12-31"
## [1] "2017-01-01"
## [1] "2017-03-22 15:00:00 EDT"
## [1] "2017-03-22 14:00:00 EDT"
Application to the nycflights13
dataset
#Extract overnight flights that appear to have arrived before they departed
depart %>%
filter(arr_time < dep_time)
depart %>%
mutate(
overnight = arr_time < dep_time,
arr_time = arr_time + days(overnight * 1),
sched_arr_time = sched_arr_time + days(overnight * 1)
)
Basic examples
## [1] 365.25
## [1] 366
## [1] 91
*Simplest = ideal
*Duration –> physical time
*Period –> human times
*Interval –> length of time span in human units
Time zones!