12 Reproducible Reports with Quarto
Stat 545 Chapter 4 (Jenny Bryan, UBC); blog posts 29-setupquarto, 07-multilanguagequartodemo, 17-rapidconversionRtoRmd.
12.1 Prerequisites
Answer the following questions to see if you can bypass this chapter. You can find the answers at the end of the chapter in Section 12.18.
- What is ‘literate programming’, and how does a
.qmdfile embody it? - Given a Quarto document that contains R code and prose, what single command renders it to an HTML file, and how does Quarto decide which output format to produce?
- What is the purpose of
execute: freeze: autoin a Quarto document, and what problem does it solve?
12.2 Learning objectives
By the end of this chapter you should be able to:
- Explain the relationship among R scripts, R Markdown, Jupyter notebooks, and Quarto.
- Create a
.qmdfile that produces HTML, PDF, and Word outputs from the same source. - Control code chunks with chunk options (
echo,eval,include,warning,message,fig-cap,fig-width,cache). - Add cross-references to figures, tables, sections, and equations that resolve in both HTML and PDF output.
- Use
bibliographyandcslto cite published work in journal-specific format. - Parameterise a report with YAML
params:and render multiple variants. - Cache computation with
freezeto avoid re-running expensive code on every render.
12.3 Orientation
An analysis is not finished when the code runs. It is finished when a collaborator can reproduce the plots, tables, and paragraphs of your paper from your source files with one command. Quarto is the current standard tool for doing that in R.
Quarto is the successor to R Markdown, sharing most of its syntax but with broader language support (R, Python, Julia, Observable JS), better cross-format fidelity, and a more polished publishing pipeline. For new projects, Quarto is the default; for legacy projects, R Markdown still works fine and the conversion path is straightforward.
12.4 The statistician’s contribution
Quarto mechanics are mechanical. The judgements:
Authorial decisions about what to compute inline. A manuscript with \(p < 0.05\) hardcoded is fragile. A manuscript with the p-value computed inline from the fitted model auto-updates when you re-run with new data. The same goes for sample sizes, effect estimates, and any number that depends on the analysis. Inline computation is the literate-programming payoff; hardcoding numbers wastes it.
When to cache, when to re-run. A 10-second analysis can be re-run on every render. A 10-minute MCMC cannot. freeze and the cache chunk option are the right tools, but using them carelessly produces stale results. The discipline: cache when the computation is expensive and deterministic; re-run when it is cheap or could change.
One source, multiple outputs. A single .qmd can produce HTML for the website, PDF for the journal, and Word for the collaborator who insists. The trade-off: cross-format compatibility constrains what you can do (no inline HTML widgets in PDF; no LaTeX math in Word without conversion). Worth it for the single-source-of-truth benefit.
Citation and bibliography hygiene. A paper with twenty undocumented references cited inline is a papier-maché methods section. Quarto with a references.bib and csl styling is the infrastructure for tracking and reformatting citations. Use it from the start; backfilling citations for an existing paper is tedious.
These judgements are what make a Quarto document a research artefact rather than a glorified Word document.
12.5 Literate programming
Donald Knuth’s original idea: source code and prose should live together in one document, with the prose explaining the code as a human would explain it to another human. The compiler extracts the code; a ‘weaver’ produces the human-readable document.
Quarto implements this with code chunks and prose interleaved in a .qmd file:
# Methods
We analysed the readmissions cohort using a multivariable
logistic regression.
```r
fit <- glm(readmit ~ home_health + age + sex,
family = binomial, data = d)
```
The adjusted odds ratio for home-health-visit receipt
was `{r} round(exp(coef(fit)["home_health"]), 2)`.
Rendering produces a paper with the OR computed at render time. Re-running with new data re-computes the OR; the prose is automatically consistent with the analysis.
This contrasts with the workflow of running an analysis in R, copy-pasting numbers into Word, and updating manually if the analysis changes. The latter is where ‘I forgot to update Table 2’ bugs come from.
12.6 The minimum Quarto document
---
title: "My analysis"
format: html
---
# Introduction
Some prose here.
```r
1 + 1
```
Save as analysis.qmd. Render with:
quarto render analysis.qmdThis produces analysis.html. Open in a browser. Done.
For PDF output:
quarto render analysis.qmd --to pdfOr set the format in the YAML:
---
format:
pdf: default
html: default
---12.7 Code chunks and chunk options
A code chunk in a .qmd:
```r
#| label: fig-readmissions
#| fig-cap: "30-day readmission rates by home-health visit"
#| fig-width: 6
#| fig-height: 4
#| echo: false
#| warning: false
ggplot(d, aes(home_health, fill = readmit)) +
geom_bar(position = "fill") +
labs(x = "Home health visit", y = "Proportion")
```
Chunk options:
echo: falsehides the code from the rendered output (useful for papers; keeptruefor teaching).include: falseruns the code but shows neither code nor output (useful for setup chunks).eval: falseshows the code but does not run it (useful for examples that should not actually execute).warning: false,message: falsesuppress R’s warnings and messages.fig-cap,fig-width,fig-heightcontrol figure rendering.label: fig-foogives the chunk a name and enables cross-referencing.cache: truecaches the chunk’s output between renders (chunk-level alternative tofreeze).
Set defaults globally in YAML:
execute:
echo: false
warning: false
message: false
cache: falseThis silences code, warnings, and messages by default across all chunks; override per-chunk as needed.
12.8 Cross-references
Quarto’s cross-reference syntax:
See @fig-readmissions for the bar plot.
```r
#| label: fig-readmissions
#| fig-cap: "30-day readmission rates"
ggplot(d, aes(...)) + ...
```
The reference resolves to a clickable link in HTML and a formatted reference in PDF. Prefixes:
fig-for figures.tbl-for tables.eq-for equations.sec-for sections (when section headers are labelled, e.g.,# Methods {#sec-methods}).
Cross-references work across formats: HTML produces hyperlinks, PDF produces ‘Figure 3’ references, Word produces field codes.
12.9 Bibliographies and citations
Add a .bib file:
---
bibliography: references.bib
csl: apa.csl
---Cite inline:
This builds on prior work [@bryan2019happygit; @marwick2018rrtools].
@wickham2019advr discusses copy-on-modify in detail.
[@key] produces a parenthetical citation; @key produces a textual one. Multiple keys are separated by semicolons. Quarto resolves the keys against the .bib file; the rendered output formats them per the csl style.
Common CSL files: apa.csl, vancouver.csl, nature.csl. Get from github.com/citation-style-language/styles and place in the project directory or reference by URL:
csl: https://www.zotero.org/styles/jama12.10 YAML front matter in depth
---
title: "Effect of home-health visits on readmissions"
author:
- name: A. Author
affiliation: University of X
orcid: 0000-0000-0000-0000
- name: B. Coauthor
date: today
date-format: "YYYY-MM-DD"
format:
html:
theme: cosmo
toc: true
code-fold: true
fig-width: 7
fig-height: 5
pdf:
documentclass: scrartcl
fig-width: 6
fig-height: 4
docx: default
bibliography: references.bib
csl: nature.csl
execute:
echo: false
warning: false
freeze: auto
params:
dataset: "data/readmissions.csv"
outcome: "readmit"
---The format: block sets format-specific options. The execute: block sets defaults for all code chunks. The params: block declares parameters accessible inside the document as params$dataset.
12.11 Caching with freeze
For expensive computations:
execute:
freeze: autoOn first render, Quarto runs every chunk and caches the result in _freeze/. On subsequent renders, Quarto checks whether the chunk source has changed; if not, it reuses the cached output without re-running.
freeze: auto is the modern default for projects with expensive analyses (a long bootstrap, an MCMC, a model fit). The cache invalidates when the chunk source changes; otherwise renders are fast.
freeze: true always uses cached output, even if sources changed; freeze: false always re-runs. Both are occasionally useful but auto is the default.
_freeze/ should be committed to git: it ensures collaborators and CI builds get the cached output without re-running expensive analyses. (For deterministic chunks; for randomised chunks, the cache embeds the seed-aware result so this is fine if your chunks set seeds.)
12.12 Parameterised reports
Run the same .qmd against different data:
params:
dataset: "data/cohort1.csv"
outcome: "readmit"d <- read.csv(params$dataset)Render variants from the command line:
quarto render analysis.qmd \
-P dataset=data/cohort2.csv \
-P outcome=mortalityOr programmatically in R:
quarto::quarto_render(
"analysis.qmd",
execute_params = list(dataset = "data/cohort2.csv",
outcome = "mortality"),
output_file = "cohort2-report.html"
)Useful for: analysis reports per subgroup, sensitivity analyses with different inclusion criteria, reports per institution in multi-site studies.
12.13 Worked example: a one-page report
---
title: "30-day readmission rates"
author: A. Statistician
date: today
format:
html: default
pdf: default
bibliography: references.bib
execute:
echo: false
warning: false
freeze: auto
---
```r
#| include: false
library(tidyverse)
library(broom)
d <- read_csv("data/readmissions.csv")
fit <- glm(readmit ~ home_health + age + sex,
family = binomial, data = d)
```
# Summary
Among `{r} nrow(d)` patients, the 30-day readmission rate
was `{r} round(100 * mean(d$readmit), 1)`%
(@tbl-summary). After adjustment for age and sex,
receipt of a home-health visit was associated with
`{r} if_else(coef(fit)["home_health"] < 0, "lower", "higher")`
odds of readmission (OR
`{r} round(exp(coef(fit)["home_health"]), 2)`,
95% CI `{r} ...`; @fig-or).
```r
#| label: tbl-summary
#| tbl-cap: "Patient characteristics"
gtsummary::tbl_summary(d, by = home_health)
```
```r
#| label: fig-or
#| fig-cap: "Adjusted odds ratios"
broom::tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
ggplot(aes(estimate, term, xmin = conf.low, xmax = conf.high)) +
geom_pointrange() +
geom_vline(xintercept = 1, linetype = "dashed")
```
# References
This produces an HTML report and a PDF, with numbers computed from the data, a labelled table and figure, and proper bibliography. Re-running with new data updates everything automatically.
12.14 Collaborating with an LLM on Quarto
LLMs handle Quarto well; the judgement about what to compute inline is human.
Prompt 1: drafting a methods section. Describe the analysis and ask: ‘draft the methods section as Quarto, with code chunks producing the key results and inline R for any numbers cited in the prose.’
What to watch for. Hardcoded numbers in the prose that should be inline R. The LLM may quote ‘p = 0.034’ as a literal when it should compute it. Push for inline.
Verification. Render the document; change the data; re-render. Numbers cited inline should change with the data; numbers hardcoded as literals will not.
Prompt 2: setting up cross-references. Paste the draft and ask: ‘add appropriate fig-cap/tbl-cap/eq-cap labels and cross- references throughout.’
What to watch for. Reference labels follow the fig-, tbl-, eq- prefix conventions. Each label should be unique. The LLM occasionally produces duplicate labels.
Verification. Render and check that every reference resolves; broken refs render as ?.
Prompt 3: parameterisation. Describe the report and the parameters and ask: ‘parameterise this report so I can render it once per cohort.’
What to watch for. The LLM should use Quarto’s params: syntax, not custom variable handling.
Verification. Render with two different parameter sets; verify the outputs differ as expected.
12.15 Principle in use
Three habits define defensible Quarto use:
- Inline computation, not hardcoded numbers. Every number in the prose comes from the analysis.
- Cross-reference everything. Figures, tables, equations, sections. Hand-typing ‘see Figure 3’ produces broken references when figures move.
- Use
freeze: auto. Expensive analyses cache; cheap ones re-run; correctness preserved.
12.16 Exercises
- Convert a short
.Rscript into a.qmdthat renders to both HTML and PDF, with one figure caption and one cross-reference. - Add
bibliography: references.biband cite at least one paper. Verify the reference appears in both HTML and PDF outputs. - Parameterise the document to take a
data_fileparameter from YAML; render three versions from three different CSV files usingquarto render --to html -P data_file=foo.csv. - Add
execute: freeze: autoand an expensive code chunk. Render twice; confirm the second render is fast. - Replace every hardcoded number in a draft of yours with inline R. Re-render with intentionally different data; confirm everything updates.
12.17 Further reading
- Quarto documentation at
quarto.org, canonical reference. - Xie, Dervieux, and Riederer (2020), R Markdown Cookbook, recipes that largely translate to Quarto.
- Stat 545 Chapter 4 (Jenny Bryan, UBC), excellent applied introduction.
12.18 Prerequisites answers
- Literate programming (coined by Knuth) interleaves code with the prose that explains it, so the same source file is the authoritative record of both the computation and its interpretation. A Quarto
.qmdfile embodies this by letting you write narrative Markdown and executable R (or Python, Julia) code chunks in one document. The render process produces a human-readable artefact (HTML, PDF, Word) with code, output, and prose interleaved. quarto render file.qmdproduces the default output format declared in the document’s YAMLformat:key (HTML if noformat:is present). To force a format, usequarto render file.qmd --to html(or--to pdf,--to docx).execute: freeze: autocaches the executed output of each code chunk into_freeze/on first render. Subsequent renders reuse the cached output if the chunk source is unchanged, avoiding re-running expensive code during every render and making CI deployments fast. Commit_freeze/to git so collaborators benefit from the cache.