September 13, 2021
Show the flow and change in frequencies as process flows between between states (in our case, variables)
Can visualize the relationship between many categorical variables at once, including the status variable
Investigators love them, be are often interpreted incorrectly!
We’ll be using the ggalluvial
package, a ggplot2 extension for alluvial plots
Axis: variable(s) from your dataset; data are grouped/stacked vertically and at certain horiztonal positions across normal x-axis (period and cancer type in the plot above)
Strata: the groupings of each axis variable (think factor levels of categorical variable; pre-covid and covid for period)
Alluvia: horizontal splines that are distributed across the plot; identified by vertical position on axis and fill color (treatment)
Flows: segments of alluvial between axes
Lodes: where the alluvia intersect the stata; cannot visualize in plots. Can imagine as rectangular box that continues from the flows through the strata
Three major types (wide, long, and tabular/array)
We will focus on wide and long (like the arrangement for repeated measures data)
vcd
packagedata(Arthritis)
Arthritis %>% group_by(Treatment, Sex, Improved) %>% tally()
Arthritis_grp <- Arthritis %>% group_by(Treatment, Sex, Improved) %>% tally()
Arthritis_grp %>%
ggplot(
aes(y = n, axis1 = Treatment, axis2 = Sex)) +
scale_x_discrete(limits = c("Treatment", "Sex"), expand = c(.2, .05)) +
geom_alluvium(aes(fill = Improved)) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), min.y = 5) +
ggtitle("Improvement pathways",
"By treatment and gender") +
ylab("Frequency") +
scale_fill_brewer(type = "qual", palette = "Set1")
to_lodes_form()
functionArthritis_lodes <- to_lodes_form(as.data.frame(Arthritis_grp),
axes = 1:3,
id = "Cohort")
Arthritis_lodes
is_lodes_form(Arthritis_lodes, key = x, value = stratum, id = Cohort, silent = TRUE)
[1] TRUE
In the above transformed data we have the following:
data(vaccinations)
vaccinations <- vaccinations %>%
mutate(response = forcats::fct_relevel(response, "Always", "Sometimes",
"Never", "Missing"))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ylab("Frequency") +
xlab("Survey") +
ggtitle("Vaccination survey responses (one question per survey)")
Notice in the above plot there is one stratum where the text does not fit well
To fix the latter we can set a parameter aes(label = after_stat(stratum)), min.y = 8)
to restrict labeling to a certain vertical height
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), min.y = 8) +
theme(legend.position = "none") +
ylab("Frequency") +
xlab("Survey") +
ggtitle("Vaccination survey responses (one question per survey)")
Notes
The geom_alluvium() differs from geom_flow() - depends on the type of dataset and the purpose of the plot!
Regarding missing values, in the plot above the removal would result in gaps whereas in the earlier plots this would not occur (depends on type of data)
Arthritis_grp %>%
ggplot(
aes(y = n, axis1 = Treatment, axis2 = Sex)) +
scale_x_discrete(limits = c("Treatment", "Sex"), expand = c(.2, .05)) +
geom_alluvium(aes(fill = Improved), curve_type = "linear") +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), min.y = 5) +
ggtitle("Improvement pathways",
"By treatment and gender") +
ylab("Frequency") +
scale_fill_brewer(type = "qual", palette = "Set1")
Arthritis_grp %>%
ggplot(
aes(y = n, axis1 = Treatment, axis2 = Sex)) +
scale_x_discrete(limits = c("Treatment", "Sex"), expand = c(.2, .05)) +
geom_alluvium(aes(fill = Improved), curve_type = "cubic") +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), min.y = 5) +
ggtitle("Improvement pathways",
"By treatment and gender") +
ylab("Frequency") +
scale_fill_brewer(type = "qual", palette = "Set1")
Arthritis_grp %>%
ggplot(
aes(y = n, axis1 = Treatment, axis2 = Sex)) +
scale_x_discrete(limits = c("Treatment", "Sex"), expand = c(.2, .05)) +
geom_alluvium(aes(fill = Improved)) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), min.y = 5) +
ggtitle("Improvement pathways",
"By treatment and gender") +
ylab("Frequency") +
scale_fill_brewer(type = "qual", palette = "Dark2")