17  Predictive Models and Forecasting

This chapter develops predictive models for turnaround time and forecasts future consultation volumes. TAT prediction is clinically relevant, as the CAP Q-Probes studies have identified key factors influencing turnaround times for complex specimens (Volmar et al. 2015), and recent evaluations have confirmed the utility of TAT as a quality metric in surgical pathology (Sharma et al. 2025).

[1] TRUE

17.1 Predicting Turnaround Time

17.1.1 Feature Engineering

Create features for modeling:

Model Dataset Summary
N_Observations N_Responders N_Askers Mean_TAT SD_TAT
5882 32 33 10.3 16.1

17.1.2 Linear Regression Model

Linear Regression Coefficients
term estimate std.error statistic p.value
(Intercept) 1.2521 0.0318 39.4162 0.0000
IsWeekend 0.5829 0.0425 13.7260 0.0000
TimeOfDayAfternoon 0.2227 0.0326 6.8281 0.0000
TimeOfDayEvening 0.9208 0.0440 20.9292 0.0000
TimeOfDayNight 0.9010 0.0819 10.9951 0.0000
Month_Fac.L 0.1298 0.0492 2.6382 0.0084
Month_Fac.Q 0.0905 0.0507 1.7855 0.0742
Month_Fac.C -0.0856 0.0505 -1.6950 0.0901
Month_Fac^4 -0.0939 0.0499 -1.8836 0.0597
Month_Fac^5 0.0303 0.0500 0.6063 0.5443
Month_Fac^6 0.0413 0.0513 0.8058 0.4204
Month_Fac^7 -0.0315 0.0496 -0.6347 0.5256
Month_Fac^8 -0.1126 0.0501 -2.2475 0.0246
Month_Fac^9 -0.0313 0.0524 -0.5978 0.5500
Month_Fac^10 -0.0437 0.0522 -0.8366 0.4028
Month_Fac^11 0.0558 0.0529 1.0538 0.2920
Case_Complexity 0.0396 0.0119 3.3366 0.0009

17.1.3 Model Performance and Cross-Validation

In-sample metrics provide an optimistic upper bound. To estimate out-of-sample performance, we perform 10-fold cross-validation on the linear regression model.

Model Performance Metrics: In-Sample and 10-Fold Cross-Validation
Metric Value Interpretation
R-squared (in-sample) 0.1248 12.5% of variance in log(TAT+1) explained
Adjusted R-squared 0.1224 Adjusted for number of predictors (log scale)
RMSE (in-sample, hours) 15.84 Average prediction error on original scale (training)
MAE (in-sample, hours) 10.3 Average absolute error on original scale (training)
R-squared (10-fold CV) 0.1199 +/- 0.0253 Out-of-sample estimate (mean +/- SD across folds)
RMSE (10-fold CV, hours) 15.86 +/- 0.9 Out-of-sample prediction error
MAE (10-fold CV, hours) 10.33 +/- 0.48 Out-of-sample absolute error
Note:
Original-scale predictions use Duan's smearing estimator (factor = 1.9951) to correct for back-transformation bias (Jensen's inequality).

17.1.4 Linear Model Assumption Diagnostics

Linear Regression Assumption Diagnostics
Test Statistic P-value Interpretation
Shapiro-Wilk normality (subsample of 5,000 from 5,882) 0.9703 <2e-16 Residuals non-normal (expected for large N; inference is asymptotically valid)
Breusch-Pagan heteroscedasticity 153.4220 <2e-16 Heteroscedasticity detected -- consider robust standard errors
Max VIF (multicollinearity) 1.0100 Acceptable (< 5) No multicollinearity concern

17.1.5 Residual Analysis

17.2 Classification Model: Fast vs Slow Response

Predict whether a consultation will be completed within 24 hours. The 24-hour threshold is a commonly adopted benchmark for intradepartmental consultation responsiveness, consistent with targets described in CAP laboratory quality literature (Volmar et al. 2015).

Logistic Regression Coefficients (Fast Response Prediction)
term estimate std.error statistic p.value
(Intercept) 2.4048 0.0977 24.6036 0.0000
IsWeekend -1.3695 0.0954 -14.3563 0.0000
TimeOfDayAfternoon -0.1197 0.0975 -1.2273 0.2197
TimeOfDayEvening -0.1882 0.1239 -1.5186 0.1289
TimeOfDayNight 0.6736 0.3124 2.1565 0.0310
Month_Fac.L 0.0698 0.1466 0.4765 0.6337
Month_Fac.Q 0.1750 0.1546 1.1320 0.2576
Month_Fac.C -0.0324 0.1505 -0.2152 0.8296
Month_Fac^4 0.3084 0.1479 2.0853 0.0370
Month_Fac^5 -0.1182 0.1467 -0.8054 0.4206
Month_Fac^6 -0.0618 0.1475 -0.4191 0.6751
Month_Fac^7 0.0125 0.1437 0.0872 0.9305
Month_Fac^8 0.0447 0.1442 0.3099 0.7566
Month_Fac^9 -0.0699 0.1490 -0.4689 0.6392
Month_Fac^10 0.1353 0.1515 0.8935 0.3716
Month_Fac^11 -0.2553 0.1572 -1.6242 0.1043
Case_Complexity 0.0149 0.0347 0.4305 0.6668

17.2.1 Classification Performance

Note: As with the linear model above, classification metrics are in-sample estimates. The AUC, sensitivity, and specificity reported here are likely overfitted to the training data.

Classification Model Performance
Metric Value Percentage
Accuracy 0.8886 88.86%
Sensitivity (Recall) 1.0000 100%
Specificity 0.0000 0%
Precision 0.8886 88.86%
Confusion Matrix
Predicted
0 1
0 0 655
1 0 5227

17.2.2 Logistic Model Diagnostics

Logistic Regression Model Diagnostics and Cross-Validation
Metric Value Interpretation
McFadden's pseudo-R2 0.052 Poor fit (< 0.1)
Hosmer-Lemeshow goodness-of-fit chi2 = 7.05, p = 0.531 Adequate calibration (fail to reject H0: good fit)
AIC 3929.9 Lower is better; penalizes model complexity
Accuracy (10-fold CV) 88.9% +/- 1.3% Out-of-sample classification accuracy (mean +/- SD)
AUC (10-fold CV) 0.609 +/- 0.024 Out-of-sample discriminative ability

17.2.3 ROC Curve

17.3 Time Series Forecasting

17.3.1 Forecasting Consultation Volume

6-Month Consultation Volume Forecast
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 Month
Dec 2025 268.49 215.87 321.11 188.02 348.96 2025-12-01
Jan 2026 275.54 213.11 337.97 180.05 371.02 2026-01-01
Feb 2026 272.99 198.59 347.40 159.20 386.78 2026-02-01
Mar 2026 273.91 190.37 357.46 146.14 401.68 2026-03-01
Apr 2026 273.58 181.43 365.73 132.64 414.51 2026-04-01
May 2026 273.70 173.80 373.60 120.92 426.48 2026-05-01

17.3.2 ARIMA Model Diagnostics

ARIMA Model Diagnostics
Diagnostic Value Interpretation
ARIMA order: 1,1,0 ARIMA(1,1,0) Selected by auto.arima via AICc
Residual autocorrelation (Ljung-Box) Q = 8.06, p = 0.623 No significant residual autocorrelation -- model adequately captures temporal structure
Residual normality (Shapiro-Wilk) W = 0.9728, p = 0.44 Residuals approximately normal -- prediction intervals valid
Residual mean 8.341 Should be near zero for unbiased forecasts
Residual SD 39.64 Forecast uncertainty scale

17.3.3 Exponential Smoothing

Forecasting Model Comparison (In-Sample Accuracy and Residual Diagnostics)
Model Type RMSE MAE MAPE Ljung-Box p
ARIMA ARIMA(1,1,0) 40.02 30.44 22.81 0.623
Exponential Smoothing (ETS) ETS(M,A,N) 43.55 32.86 25.25 0.934

17.4 Feature Importance Analysis

17.4.1 Variable Importance in TAT Prediction

17.5 Model Recommendations

Predictive Modeling Recommendations
Finding Recommendation
Weekend consultations have longer TAT Consider weekend-specific resource allocation or expectations
Model explains only 12.5% of variance Consider additional features: responder workload, case type, subspecialty

17.6 Model Summary

Summary of Predictive Models
Model Target Variable Key Metric Use Case
Linear Regression (TAT Prediction) Log(TAT + 1) R² = 0.125 Estimate expected turnaround time
Logistic Regression (Fast Response) Within 24 hours (Binary) Accuracy = 88.9% Identify consultations at risk of delay
ARIMA (Volume Forecast) Monthly consultation count RMSE = 40.02 Forecast future consultation demand
Exponential Smoothing Monthly consultation count RMSE = 43.55 Alternative forecasting approach