Preface

Why this book?

Most biostatistics graduate curricula teach statistical theory thoroughly and practical data-analysis craft casually. Students learn likelihood theory and the fine points of hypothesis testing but arrive at their first collaborative project unsure how to set up a project directory, commit changes with git, parameterise a report, or write a defensible statistical analysis plan.

This book aims to close that gap. It teaches the practical skills needed to do reproducible biostatistical work in 2026: the tools, the workflows, and the professional norms. It assumes the reader will also study a theory-heavy statistical computing course alongside this one, and treats modelling as that course’s territory.

What is different here

  • Reproducibility is the organising principle. Each chapter is framed around producing analyses that a collaborator can rerun and extend. This is the animating concern of modern biostatistics.
  • AI-assisted workflows are taught explicitly. Students in 2026 will use large language models in their daily work. Rather than pretend otherwise, the book dedicates a chapter to using LLMs responsibly in R coding, and ends every chapter with adversarial prompts designed to expose model limits.
  • Real case studies. Two full analyses (Palmer Penguins and ADNI MCI prediction) are carried end-to-end, from project scaffolding to final report.
  • Federal context. Biostatisticians working on federally funded research must meet NIH/NSF reproducibility requirements. A dedicated chapter covers what those requirements are and how to satisfy them.

Prerequisites

Readers should have:

  • An undergraduate statistics background.
  • Basic R familiarity (subsetting, the pipe, writing functions). Readers who lack this should first work through the early chapters of the companion book or R for Data Science (Wickham et al., 2023).
  • Access to R 4.4+, RStudio (optional), and Git.

No prior Docker, renv, or Quarto experience is assumed.

Conventions

See the Conventions page for the visual cues used throughout the book.

Acknowledgements

This book draws heavily on Jenny Bryan’s STAT 545 (Bryan & Stephens, 2019) and Happy Git with R (Bryan, 2019), Ben Marwick and colleagues’ rrtools (Marwick, 2018; Marwick et al., 2018), and Hadley Wickham’s book family (Wickham et al., 2023; Wickham & Bryan, 2023).

I thank the graduate students whose questions shaped the material substantially.

Ronald “Ryy” G. Thomas
La Jolla, California
Spring 2026