Evaluating clinical prediction models requires assessing discrimination, calibration, and clinical utility. This vignette demonstrates a comprehensive evaluation workflow:
We’ll use the NCCTG Lung Cancer dataset and fit three logistic regression models of increasing complexity:
data(cancer, package = "survival")
cancer$dead <- cancer$status == 2
cancer$event <- ifelse(cancer$status == 2, 1, 0)
cancer_clean <- cancer %>%
mutate(sex = factor(sex, labels = c("Male", "Female"))) %>%
na.omit()
# Fit three models with increasing complexity
cancer_clean$pred_model1 <- predict(
glm(dead ~ age + sex + ph.karno, data = cancer_clean, family = binomial),
type = "response"
)
cancer_clean$pred_model2 <- predict(
glm(dead ~ age + sex + ph.karno + pat.karno, data = cancer_clean, family = binomial),
type = "response"
)
cancer_clean$pred_model3 <- predict(
glm(dead ~ age + sex + ph.karno + wt.loss, data = cancer_clean, family = binomial),
type = "response"
)
knitr::kable(head(cancer_clean[, c("dead", "pred_model1", "pred_model2", "pred_model3")]),
caption = "Model Predictions vs Actual Outcomes (First 6 Rows)"
)| dead | pred_model1 | pred_model2 | pred_model3 | |
|---|---|---|---|---|
| 2 | TRUE | 0.7868731 | 0.7661243 | 0.7858287 |
| 4 | TRUE | 0.7436147 | 0.8215218 | 0.7434448 |
| 6 | FALSE | 0.9260244 | 0.8954047 | 0.9273628 |
| 7 | TRUE | 0.7052554 | 0.7566565 | 0.7049340 |
| 8 | TRUE | 0.7704063 | 0.7279677 | 0.7724185 |
| 9 | TRUE | 0.8207969 | 0.8021612 | 0.8203195 |
classif_model_compare() provides a comprehensive
comparison using multiple metrics (AUC, Accuracy, Sensitivity,
Specificity, PPV, NPV, PRAUC, Brier Score) and generates ROC, PR,
calibration, and DCA plots:
model_comparison <- classif_model_compare(
data = cancer_clean,
target_var = "dead",
model_names = c("pred_model1", "pred_model2", "pred_model3"),
save_output = FALSE
)
knitr::kable(model_comparison$metric_table, caption = "Model Performance Comparison")| Model | AUC | PRAUC | Accuracy | Sensitivity | Specificity | Pos Pred Value | Neg Pred Value | F1 | Kappa | Brier | cutoff | Youden | HosLem | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | pred_model3 | 0.684 (0.594, 0.774) | 0.820 | 0.581 | 0.50 | 0.787 | 0.857 | 0.381 | 0.632 | 0.217 | 0.185 | 0.777 | 0.287 | 0.985 |
| 1 | pred_model1 | 0.684 (0.594, 0.774) | 0.817 | 0.557 | 0.45 | 0.830 | 0.871 | 0.371 | 0.593 | 0.203 | 0.185 | 0.786 | 0.280 | 0.967 |
| 2 | pred_model2 | 0.684 (0.590, 0.777) | 0.821 | 0.605 | 0.55 | 0.745 | 0.846 | 0.393 | 0.667 | 0.232 | 0.182 | 0.765 | 0.295 | 0.353 |
For survival outcomes, time-dependent ROC accounts for the timing of events. First, we fit a Cox proportional hazards model and obtain risk scores:
# Fit Cox model for survival prediction
cox_model <- coxph(Surv(time, event) ~ age + sex + ph.karno, data = cancer_clean)
cancer_clean$risk_score <- predict(cox_model, type = "risk")Now we can evaluate the model using time-dependent ROC at specific time points:
time_roc_result <- time_roc_plot(
data = cancer_clean,
event_var = "event",
time_var = "time",
marker_var = "risk_score",
times = c(200, 365),
time_unit = "days",
save_plot = FALSE
)
time_roc_result
#> $time_roc
#> Time-dependent-Roc curve estimated using IPCW (n=167, without competing risks).
#> Cases Survivors Censored AUC (%)
#> t=200 48 111 8 67.32
#> t=365 87 49 31 60.85
#>
#> Method used for estimating IPCW:marginal
#>
#> Total computation time : 0 secs.
#>
#> $plotThe C-index (concordance index) measures discrimination for survival models. Interpretation: 0.5 = no discrimination, 0.7-0.8 = acceptable, >0.8 = excellent.
The variable importance list could be obtained from the different models like random forest, gradient boosting machine, etc. Here, we assume we have the variable importance scores from some random model.
importance_scores <- c(
Age = 0.25, Sex = 0.15, Karnofsky = 0.30,
WeightLoss = 0.12, Calories = 0.08, PatKarnofsky = 0.10
)
importance_plot(x = importance_scores, x_lab = "Variable Importance")Show only top variables with custom styling:
importance_plot(
x = importance_scores,
x_lab = "Relative Importance",
top_n = 4,
color = "steelblue"
)classif_model_compare(): Comprehensive
model comparison with ROC, PR, calibration, and DCA plotstime_roc_plot(): Time-dependent ROC
analysis for survival outcomescalc_cindex(): Calculate concordance
index for survival modelsimportance_plot(): Visualize variable
importance