25  Reproducibility

How to regenerate data and render this book.

25.1 Prerequisites

  • Raw Excel exports live in data/ (usage logs, assignment logs, internal consultation reports).
  • Name canonicalization lives in data/config/name_mapping.csv. If missing or stale, regenerate with the scripts below.
  • Required R packages: dplyr, ggplot2, lubridate, readxl, stringr, knitr, kableExtra, plus optional analytics packages (igraph, ggraph, tidygraph, DescTools, forecast, pROC, Kendall, moments). Install with install.packages() or pin via renv.

25.2 Steps to Rebuild

  1. Regenerate processed data:
#| label: load-script
#| eval: false
source("R/process_data.R")
  1. Refresh the name mapping (only when new names appear):
#| eval: false
source("R/generate_mapping.R")   # builds/updates data/config/name_mapping.csv
source("R/update_mapping.R")     # optional helper to review changes
  1. Render the book:
quarto render .

25.3 Environment Management

  • Prefer reproducible environments (e.g., renv::init() followed by renv::snapshot() or pak::lockfile()) so package versions are pinned.
  • If renders need to be stable across days, replace date: today in _quarto.yml with a fixed date string.
  • Store secrets outside the repo; only anonymized pathologist codes are written to data/processed/pathologist_codes.rds.