26 Course Synthesis and Review
Adapted from author’s lecture notes and supporting materials for a graduate practicum in biostatistics.
26.1 Prerequisites
No Prerequisites quiz for this chapter. Quiz answers appear anyway, in case you want to self-assess what you have taken away from the book. See Section 26.8.
26.2 What the Practicum covered
This book took you through six arcs:
Reproducibility as a professional norm. Why it matters (Chapter 2), what federal funders require (Chapter 3), and what it means to practise it in a research team (Chapter 4).
A reproducible workstation. Setting up R, editor, Git, the shell (Chapter 5), Git for solo work (Chapter 6), and remote compute (Chapter 7).
Reproducible infrastructure. Research compendia (Chapter 8), package version pinning with renv (Chapter 9), environment reproducibility with Docker (Chapter 10), and the all-in-one
zzcollabframework (Chapter 11).Reproducible reporting and wrangling. Quarto for literate documents (Chapter 12), Rmd workflow (Chapter 13), tidyverse wrangling (Chapter 14), type handling (Chapter 15), joins (Chapter 16), and publication graphics (Chapter 17).
Analysis practice. Missing data (Chapter 18), statistical analysis plans (Chapter 19), CDISC data standards (Chapter 20), testing data analysis workflows (Chapter 21), and AI-assisted coding (Chapter 22); plus the SAS bridge
- for cross-language regulatory work.
Case studies. Palmer Penguins
- for an end-to-end small analysis, and ADNI MCI prediction
- for a longitudinal modelling exercise.
The arcs are progressive: each builds on the previous. Reproducibility is impossible without infrastructure; infrastructure is wasted without disciplined reporting; reporting without an analysis plan is performative.
26.3 The habits you should take forward
A concise list of practices that every chapter pointed at:
- Every analysis lives in a compendium. A research-compendium structure with
R/,analysis/, and aDockerfileis the unit of reproducibility. - Every compendium is Dockerised before it leaves your machine.
renvpins R packages; Docker pins everything else. - Every chunk of analysis code is under version control. Git from day one; tag at submission.
- Every clinically meaningful analysis has a pre-specified SAP, tagged in Git before data lock. Pre-specification is what distinguishes confirmatory from exploratory work.
- Every non-trivial function has at least one unit test. Tests catch regressions on refactor and document expected behaviour.
- Every figure in a paper is regenerable by
make render(or its equivalent). Manual steps in figure generation are reproducibility holes. - Every LLM suggestion is treated as a hypothesis until verified. AI-amplified workflow demands amplified verification.
If you take only one habit from this book, take the first: every analysis lives in a compendium. Everything else flows from it.
26.4 What this book did not cover
The scope of biostatistical computing exceeds any one book. Topic areas explicitly out of scope here but worth pursuing:
- Statistical theory and inference. Covered in the companion book Statistical Computing in the Age of AI. Linear models, GLMs, mixed-effects, survival, Bayesian computation, simulation, and bootstrap each get a chapter there.
- Bayesian computation. A topic area of comparable weight to this entire book. Covered briefly in the companion’s Bayesian chapter.
- Modern machine learning.
tidymodels, deep learning, MLOps, and the broader ML pipeline are one-book subjects on their own. The Practicum’s reproducibility habits transfer; the methods do not. - Causal inference. Pre-specification (chapter
- is necessary but not sufficient. The potential-outcomes framework, instrumental variables, sensitivity analysis to unmeasurement, and modern econometric methods belong elsewhere.
- Collaboration at industrial scale. Multi- sponsor pharma collaborations and CROs operate under CDISC conventions beyond what Chapter 20 introduces.
The omissions are deliberate. The book aimed to cover what every biostatistician needs in their first two years; it cannot cover what they may need over a career.
26.5 What to read next
A curated reading list, roughly two years of material:
Year one (consolidation).
- R Packages, 2nd ed. (Wickham & Bryan, 2023) — package development. Read end to end; build a package with each chapter.
- R for Data Science, 2nd ed. (Wickham et al., 2023), tidyverse depth. The Practicum touched many of these chapters; the full book has more.
- Advanced R, 2nd ed. (Wickham, 2019) — the R language. Memory model, functional programming, environments, S4. Read when you hit a corner case.
- Happy Git with R (Bryan, 2019) — Git for R users in depth.
- Bayesian Data Analysis, 3rd ed. (Gelman et al., 2013) or Statistical Rethinking, 2nd ed. (McElreath, 2020) — Bayesian computation. Pick one based on preferred level.
Year two (specialisation).
- Modeling Survival Data (Therneau & Grambsch, 2000) for survival analysis depth.
- Mixed-Effects Models in S and S-PLUS (Pinheiro & Bates, 2000) for the canonical mixed- models reference (still relevant in 2026).
- Tidy Modeling with R (Kuhn & Silge, 2022) for the tidymodels framework.
- Reproducible Research with R and RStudio, 3rd ed. (Gandrud, 2020) for the publishing pipeline.
- Whatever else your domain demands. Genomics: Modern Statistics for Modern Biology (Holmes & Huber, 2019). Imaging: the various Bioconductor and FSL/AFNI tutorials. Time series: Forecasting Principles and Practice (Hyndman & Athanasopoulos, 2021).
The plan: in year one, consolidate the practicum’s coverage into deeper proficiency. In year two, specialise toward your applied area.
26.6 A note on judgement
The Practicum has emphasised tools and habits. What it could not teach directly is judgement — the small decisions about which test, which covariate, which transformation, which audience. Judgement comes from practice: working on real problems, being wrong sometimes, learning the patterns of when each tool fits.
The ‘statistician’s contribution’ callouts in each chapter were an attempt to articulate the judgements explicitly. Re-read them after you have practised the tools for six months; they will read differently.
26.7 Sign-off
You have finished a book that took its infrastructure as seriously as its statistics. The arc was deliberate: the biostatistician of 2026 spends a substantial fraction of professional time on workflow, environment, and collaboration. A graduate course that taught only methods would leave the rest as on-the-job learning. This book covered the rest.
The habits in this book pay for themselves at about the second project. By the fifth, the investment is invisible: it is just how you work. The hardest part is the first three projects, where the new infrastructure feels like overhead. The reward is permanence: a research record that ages well, an analysis you can re-run a decade later, a paper a sceptical reviewer cannot break.
Good luck with the next analysis.
26.8 Prerequisites answers
(No quiz for this chapter; this section exists to keep cross-reference anchors consistent across the book. If you would like a self-assessment, write a two-sentence answer to each of the following.)
- If a former collaborator emails in three years asking you to reproduce one of your figures, what three files do you need to have kept?
- What is one practice from this book you will carry into the next project you start?
- What is the next book on your reading list?