17 Plotting with ggplot2 and purrr
Stat 545 Part VII; blog posts 16-plotsfrompurrr, 38-tableplacementrmarkdown; zzlongplot for longitudinal examples (Chapter 25).
17.1 Prerequisites
Answer the following questions to see if you can bypass this chapter. You can find the answers at the end of the chapter in Section 17.15.
- How do you produce one plot per level of a categorical variable, returning the results as a named list of
ggplotobjects? - What does
patchwork::wrap_plots()provide thatfacet_wrap()does not? - Why might you save a figure as PDF rather than PNG for a journal submission, and when would you choose PNG instead?
17.2 Learning objectives
By the end of this chapter you should be able to:
- Build publication-quality
ggplot2figures for continuous, categorical, and time-varying outcomes. - Generate many plots programmatically with
purrr::map()over a list of groups. - Compose multi-panel figures with
patchworkand arrange shared legends cleanly. - Produce PDF and PNG outputs at appropriate DPIs for journal and Word submissions.
- Build a house theme via a
theme_*()function and reuse it across all figures in a project. - Recognise when to reach for
zzlongplotandzztable1rather than rolling your own.
17.3 Orientation
A figure that survives peer review is a figure whose data, encoding, and aesthetic choices all serve the single claim the figure exists to make. This chapter covers the mechanical skills; the principles of effective visualisation are covered in the companion textbook (chapters 15–16 of Statistical Computing in the Age of AI).
The combination ggplot2 + purrr + patchwork is the modern stack for going from a single exploratory plot to a polished multi-panel figure: ggplot2 for the plot, purrr for iteration over groups, patchwork for composition.
17.4 The statistician’s contribution
Software handles the mechanics. The judgements:
One plot, one message. A figure trying to convey three relationships succeeds at none. Three figures each conveying one are usually clearer. Resist the ‘pack everything into one figure to save space’ impulse.
Encode the claim, not the data. A scatter plot with 5,000 points obscures any pattern. The same relationship as a hex-binned plot or a 2D density contour shows the structure without the visual noise. Choose encoding for legibility, not faithfulness to the raw data.
Consistency across the project. All figures in a paper should share fonts, colour palette, and theme defaults. A house theme function applied via theme_set() makes consistency the default. Without it, ggplot2 produces nine variants of grey.
Display uncertainty. A point estimate without a CI invites over-trust. A regression line without a confidence band the same. The tools are simple (geom_errorbar, se = TRUE on geom_smooth); the discipline is the analyst’s.
These judgements are what make figures publishable.
17.5 One plot per group with purrr
The pattern: split data by a grouping variable, apply a plotting function to each group, collect into a named list.
library(tidyverse)
library(palmerpenguins)
penguins_clean <- na.omit(penguins)
# split by species
plots <- penguins_clean |>
split(penguins_clean$species) |>
imap(\(d, name) {
ggplot(d, aes(flipper_length_mm, body_mass_g)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = name) +
theme_minimal()
})
# inspect one
plots$Adelie
# save all
walk2(plots,
paste0("figures/penguin-", names(plots), ".pdf"),
\(p, path) ggsave(path, p, width = 6, height = 4))Three things to internalise:
split()returns a named list. The names propagate throughimapandwalk2.imapandwalk2use both the value and the name. Useful for titles, file paths, and identifying which plot is which.walk2for side effects. Returns invisibly;map2would build a list ofggsave’s NULL returns.
For a dplyr-native version:
penguins_clean |>
group_by(species) |>
group_map(\(d, k) {
ggplot(d, aes(flipper_length_mm, body_mass_g)) +
geom_point() +
labs(title = k$species)
})
# returns a list of plots, with the group keys in `k`17.6 Multi-panel figures with patchwork
patchwork composes distinct plots into one figure. Where facet_wrap produces small multiples of the same plot, wrap_plots (or the + and / operators) combines independent plots that may differ in everything.
library(patchwork)
p1 <- ggplot(penguins_clean,
aes(flipper_length_mm, body_mass_g, colour = species)) +
geom_point() +
labs(title = "A. Body mass vs flipper length")
p2 <- ggplot(penguins_clean,
aes(species, body_mass_g, fill = species)) +
geom_boxplot() +
labs(title = "B. Body mass by species") +
guides(fill = "none") # hide redundant fill legend
p3 <- ggplot(penguins_clean,
aes(bill_length_mm, bill_depth_mm, colour = species)) +
geom_point() +
labs(title = "C. Bill morphology")
# 2x2 grid with shared legend at bottom
(p1 + p2) / (p3 + plot_spacer()) +
plot_layout(guides = "collect") &
theme(legend.position = "bottom")Operators:
+juxtaposes side by side./stacks vertically.|is equivalent to+(horizontal).&applies a theme to every plot in the composition.plot_layout(guides = "collect")collects duplicate legends into one.
For more control, plot_layout takes widths, heights, nrow, ncol, byrow, and other arguments.
facet_wrap for small multiples of the same plot; patchwork for distinct plots in one figure.
17.7 House themes
A function that returns a theme() object, applied once per session:
theme_practicum <- function(base_size = 11) {
theme_minimal(base_size = base_size) +
theme(
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(colour = "grey40"),
axis.title = element_text(face = "bold"),
axis.text = element_text(colour = "grey20"),
panel.grid.minor = element_blank(),
legend.position = "bottom",
strip.background = element_rect(fill = "grey95", colour = NA),
strip.text = element_text(face = "bold")
)
}
# apply globally
theme_set(theme_practicum())
# update geom defaults to match
update_geom_defaults("point", list(size = 1.5, alpha = 0.7))
update_geom_defaults("line", list(linewidth = 0.6))For project consistency, define theme_practicum() in a setup chunk at the top of every analysis file, or in a project-level helper that gets sourced. Every plot then matches without per-plot theme calls.
For palettes, define and reuse:
practicum_palette <- c(
"Adelie" = "#1f4e79",
"Chinstrap" = "#9d2235",
"Gentoo" = "#2e8b57"
)
scale_colour_practicum <- function(...)
scale_colour_manual(values = practicum_palette, ...)
# in plots
ggplot(...) + ... + scale_colour_practicum()For colour-blind safety, verify the palette with the Coblis simulator (colorbrewer2.org/learnmore/colorblind-simulator.html) or colorBlindness::cvdPlot().
17.8 Export formats
ggsave() writes to many formats, picked by extension:
# vector for journal submission
ggsave("figure1.pdf", plot = p, width = 6, height = 4,
device = cairo_pdf)
# raster for Word, slides, web
ggsave("figure1.png", plot = p, width = 6, height = 4,
dpi = 300)
# vector for the web (preserves quality on zoom)
ggsave("figure1.svg", plot = p, width = 6, height = 4)DPI considerations:
- 72 DPI: web display only.
- 96 DPI: monitor default.
- 300 DPI: print quality (the standard for journal submission as PNG).
- 600 DPI: high-quality scientific figures.
For LaTeX submission, prefer PDF (vector). For Word, prefer PNG at 300 DPI. For web, SVG (vector).
device = cairo_pdf ensures non-default fonts are embedded so the PDF renders correctly on a system that lacks the font.
For journals, check size requirements (typically column widths of 85–90 mm or 170–180 mm) and produce figures at exactly the target size, not larger.
17.9 Integrating with zzlongplot and zztable1
The in-house zzlongplot package provides opinionated longitudinal-study plots: spaghetti plots with confidence ribbons, mean trajectories with group-specific overlays, ICC-aware error bars. The defaults match this practicum’s conventions.
library(zzlongplot)
spaghetti(adni_long, time = "VISIT", value = "ADAS",
subject = "RID", group = "DX")When to reach for zzlongplot:
- Longitudinal data with a standard time variable.
- You want consistent styling with other figures in this practicum.
- You do not want to write fifty lines of ggplot2 every time.
When to write your own ggplot2:
- Custom encoding the package does not support.
- Cross-sectional or non-longitudinal data.
- You need a non-standard layout that fights the package’s defaults.
The zztable1 package similarly handles Table 1 generation; covered in the analysis-plan chapter (Chapter 19).
17.10 Worked example: regression diagnostic figure
library(tidyverse)
library(patchwork)
library(broom)
theme_set(theme_practicum())
fit <- lm(body_mass_g ~ flipper_length_mm + species,
data = na.omit(penguins))
diag <- augment(fit)
p_resid <- ggplot(diag, aes(.fitted, .resid)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "loess", se = FALSE,
colour = "red", linewidth = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "A. Residuals vs. fitted",
x = "Fitted (g)", y = "Residual")
p_qq <- ggplot(diag, aes(sample = .std.resid)) +
geom_qq(alpha = 0.5) +
geom_qq_line(colour = "red") +
labs(title = "B. Normal Q-Q",
x = "Theoretical", y = "Standardised residual")
p_cook <- ggplot(diag, aes(seq_len(nrow(diag)), .cooksd)) +
geom_col(width = 0.6) +
geom_hline(yintercept = 4 / nrow(diag),
linetype = "dashed", colour = "red") +
labs(title = "C. Cook's distance",
x = "Observation", y = "Cook's distance")
(p_resid + p_qq) / p_cook +
plot_layout(heights = c(1, 0.7))
ggsave("figures/diagnostics.pdf", width = 7, height = 6,
device = cairo_pdf)The composition reads naturally: top row two related diagnostics, bottom row a single observation-level diagnostic. House theme applied via theme_set(theme_practicum()). Saved as a publication-quality PDF.
17.11 Collaborating with an LLM on graphics
LLMs handle ggplot well; the trap is busy plots that encode too much.
Prompt 1: drafting a plot. Describe the data and the question, ask: ‘write a ggplot that addresses the question. Use a colour-blind-safe palette and clear axis labels with units.’
What to watch for. The default LLM plot tends to be busy: too many aesthetics encoded, default ggplot theme. Push for clarity. Multiple iterations of ‘simpler’ tend to improve.
Verification. Render the plot. Ask whether a reader who has never seen the data could state the message in one sentence.
Prompt 2: combining plots. Describe four plots, ask: ‘combine these into a 2x2 grid with shared legend, panel labels A through D.’
What to watch for. patchwork::plot_layout(guides = "collect") for the legend. Panel labels usually go in labs(title = "A. ...") rather than tag_levels. The LLM may use either; both work.
Verification. Render the combined plot. Are legends shared? Are panels labelled in the right order?
Prompt 3: theme function. Ask: ‘write a theme_practicum() function with serif body text, sans-serif axis labels, and a colour-blind-safe default palette.’
What to watch for. The output is a starting point. Test with a few plots; iterate.
Verification. Apply to several plots; ensure consistency. Verify palette via Coblis or colorBlindness::cvdPlot().
17.12 Principle in use
Three habits define defensible plotting:
- One plot, one message. Resist combining relationships into one figure.
- Set the theme once.
theme_set(theme_practicum())at the top of every analysis script. - Export for the destination. PDF for LaTeX, PNG at 300 DPI for Word, SVG for web. Embed fonts in PDFs.
17.13 Exercises
- Using the
palmerpenguinsdata, build a three-panel patchwork: (a) scatter of body mass vs flipper length coloured by species; (b) residuals from a linear fit of (a); (c) QQ plot of residuals. Share the species legend across all three. - Write a function
plot_per_site(data, site_col, outcome)that returns a named list of ggplot objects (one per site level) and a helper that saves each tofigures/<site>.pdf. - Define
theme_practicum()and apply it to three plots from any prior exercise. Verify the plots render consistently in HTML, PDF, and Word output. - Replicate one published figure from a recent biomedical paper using ggplot2. Compare your replication to the original; identify what is the same and what is different.
- Verify your colour palette with the Coblis simulator. Adjust if any pair of categories become indistinguishable under deuteranopia.
17.14 Further reading
- Wickham (2016), ggplot2: Elegant Graphics for Data Analysis, 3rd ed. at
ggplot2-book.org— canonical reference. - Wilke (2019), Fundamentals of Data Visualization at
clauswilke.com/dataviz, effective- visualisation principles. - The
patchwork,cowplot, andgganimatepackage vignettes.
17.15 Prerequisites answers
- Use
split(data, data$group) |> map(\(d) ggplot(d, aes(x, y)) + geom_point())to produce a named list of plots, one per group. Alternatively, in a tidyverse pipeline:data |> group_by(group) |> group_map(\(d, k) ggplot(d, aes(x, y)) + geom_point())(note:group_mapreturns a list, with names taken from the group keys if you pass.keep = TRUEand post-process). facet_wrap()splits a single plot into panels by a factor. Every panel shares the sameaesmapping, same geom, same scales.patchwork::wrap_plots()composes distinct plots that can differ in every respect (different geoms, different data, different scales). Usefacet_wrap()for small-multiples of the same plot;wrap_plots()for multi-panel figures showing different views.- PDF is a vector format: it scales cleanly, looks crisp at any zoom level, embeds fonts, and is what journal production systems prefer for typeset PDFs. PNG is raster: appropriate for embedding in Word documents, PowerPoint slides, or web pages where vector support is limited. Save PDF for LaTeX submissions, PNG at 300 DPI for Word and web.