Survcompare_application

Package background

The package checks whether there are considerable non-linear and interaction terms in the data, and quantifies their contributions to the models’ performance. Using repeated nested cross-validation (that is, a system of random data splits into the training and testing sets to evaluate prediction accuracy), the package:

Predictive performance is evaluated by averaging different measures across all train-test splits. The following measures are computed:

Performance metrics’ description and definitions can be found, for example, in Steyerberg & Vergouwe (2014).

References:

Therneau T (2023). A Package for Survival Analysis in R. R package version 3.5-7, https://CRAN.R-project.org/package=survival.

Simon N, Friedman J, Tibshirani R, Hastie T (2011). “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software, 39(5), 1–13. doi:10.18637/jss.v039.i05.

Ishwaran H, Kogalur U (2023). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 3.2.2, https://cran.r-project.org/package=randomForestSRC

Steyerberg EW, Vergouwe Y. (2014). Towards better clinical prediction models: seven steps for development and an ABCD for validation. European heart journal, 35(29), 1925-1931 https://doi.org/10.1093/eurheartj/ehu207

What can be inferred from the survcompare results?

There are two possible outcomes: 1) “Survival Random Forest ensemble has NOT outperformed CoxPH”, or 2) “Survival Random Forest ensemble has outperformed CoxPH by … in C-index”.

  1. If there is no outperformance, the test can justify the employment of CoxPH model and indicate a negligible advantage of using a more flexible model such as Survival Random Forest.

  2. In the case of the ensemble’s outperformance of the Cox model, a researcher can:

    • Employ a more complex or flexible model.
    • Look for the interaction and non-linear terms that could be added to the CoxPH and re-run the test.
    • Consider using the CoxPH model despite its underperformance if the difference is not large enough to sacrifice model’s interpretability, or negligible in the context of a performed task.

Before interpreting the results, ensure that a sufficient number of repetitions (repeat_cv) has been used. Parameter repeat_cv should be at least 5, but for a heterogeneous data 20-50 may be needed. Otherwise, the results may unreliable and subject to chance.

Why the CoxPH-SRF ensemble and not just SRF?

The ensemble of CoxPH and SRF takes CoxPH’s predictions and adds them to the list of predictors to train SRF. This way, we make sure that linearity is captured by the ensembled SRF at least as good as in the CoxPH, and hence its outperformance can be fully attributed to the qualities of SRF that CoxPH does not have, that is, handling data non-linearity.

Package installation

# install.packages("devtools")
# devtools::install_github("dianashams/survcompare")
library(survcompare)
#> Loading required package: survival

Examples

Example 1. Linear data

The first example takes simulated data that does not contain any complex terms, and CoxPH is expected to perform as good as Survival Random Forest.


mydata <- simulate_linear()
mypredictors <- names(mydata)[1:4]

compare_models <- survcompare(mydata, mypredictors)
#> [1] "Cross-validating Cox PH model using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |======================================================================| 100%
#> Time difference of 1.191734 secs
#> [1] "Cross-validating CoxPH and SRF Ensemble using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |======================================================================| 100%
#> Time difference of 7.5005 secs
#> 
#> Internally validated test performance of CoxPH     and Survival Random Forest ensemble:
#>                   T C_score  AUCROC     BS BS_scaled Calib_slope Calib_alpha
#> CoxPH        8.5678  0.7240  0.7638 0.1475    0.1497      1.1592      0.1614
#> SRF_Ensemble 8.5678  0.6972  0.7290 0.1488    0.1403      1.1451      0.0798
#> Diff         0.0000 -0.0269 -0.0348 0.0013   -0.0094     -0.0141     -0.0816
#> pvalue          NaN  0.8910  0.9292 0.5304    0.6168      0.5289      0.9675
#>               sec
#> CoxPH        1.19
#> SRF_Ensemble 7.50
#> Diff         6.31
#> pvalue        NaN
#> 
#> Survival Random Forest ensemble has NOT outperformed CoxPH    with mean c-index difference of-0.0269.
#> The difference is not statistically significant with the p-value = 0.891. 
#> The data may NOT contain considerable non-linear or cross-term dependencies, 
#> that could be captured by the Survival Random Forest.
#> C-score: 
#>   CoxPH      0.724(95CI=0.685-0.7697;SD=0.0369)
#>   SRF_Ensemble 0.6972(95CI=0.6527-0.7312;SD=0.0292)
#> AUCROC:
#>   CoxPH      0.7638(95CI=0.7137-0.805;SD=0.038)
#>   SRF_Ensemble 0.729(95CI=0.6728-0.7642;SD=0.037)
summary(compare_models)
#> 
#> Internally validated test performance of CoxPH     and Survival Random Forest ensemble:
#>                   T C_score  AUCROC     BS BS_scaled Calib_slope Calib_alpha
#> CoxPH        8.5678  0.7240  0.7638 0.1475    0.1497      1.1592      0.1614
#> SRF_Ensemble 8.5678  0.6972  0.7290 0.1488    0.1403      1.1451      0.0798
#> Diff         0.0000 -0.0269 -0.0348 0.0013   -0.0094     -0.0141     -0.0816
#> pvalue          NaN  0.8910  0.9292 0.5304    0.6168      0.5289      0.9675
#>               sec
#> CoxPH        1.19
#> SRF_Ensemble 7.50
#> Diff         6.31
#> pvalue        NaN
#> 
#> Survival Random Forest ensemble has NOT outperformed CoxPH    with mean c-index difference of-0.0269.
#> The difference is not statistically significant with the p-value = 0.891. 
#> The data may NOT contain considerable non-linear or cross-term dependencies, 
#> that could be captured by the Survival Random Forest.
#> C-score: 
#>   CoxPH      0.724(95CI=0.685-0.7697;SD=0.0369)
#>   SRF_Ensemble 0.6972(95CI=0.6527-0.7312;SD=0.0292)
#> AUCROC:
#>   CoxPH      0.7638(95CI=0.7137-0.805;SD=0.038)
#>   SRF_Ensemble 0.729(95CI=0.6728-0.7642;SD=0.037)

# Other objects in the output object
names(compare_models)
#> [1] "results_mean"       "results_mean_train" "return_models"     
#> [4] "test"               "train"              "difftest"          
#> [7] "main_stats"         "randomseed"         "useCoxLasso"

Example 2. Non-linear data with interaction terms

The second example takes simulated data that contains non-linear and cross-terms, and an outperformance of the tree model is expected. We will increase the default number of cross-validation repetitions to get a robust estimate, and choose CoxLasso as our baseline linear model.


mydata2 <- simulate_crossterms()
mypredictors2 <- names(mydata)[1:4]

compare_models2 <- survcompare(mydata2,mypredictors2)
#> [1] "Cross-validating Cox PH model using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |======================================================================| 100%
#> Time difference of 0.2443171 secs
#> [1] "Cross-validating CoxPH and SRF Ensemble using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |======================================================================| 100%
#> Time difference of 8.165306 secs
#> 
#> Internally validated test performance of CoxPH     and Survival Random Forest ensemble:
#>                   T C_score AUCROC      BS BS_scaled Calib_slope Calib_alpha
#> CoxPH        8.0774  0.6103 0.6303  0.1667    0.0105      0.8752      0.2008
#> SRF_Ensemble 8.0774  0.6687 0.6717  0.1576    0.0640      0.7725      0.2045
#> Diff         0.0000  0.0584 0.0415 -0.0091    0.0535     -0.1026      0.0037
#> pvalue          NaN  0.0072 0.0107  0.3152    0.0109      0.8000      0.4706
#>               sec
#> CoxPH        0.24
#> SRF_Ensemble 8.17
#> Diff         7.93
#> pvalue        NaN
#> 
#> Survival Random Forest ensemble has outperformed CoxPH    by 0.0584 in C-index.
#> The difference is statistically significant with the p-value 0.0072**.
#> The supplied data may contain non-linear or cross-term dependencies, 
#> better captured by the Survival Random Forest.
#> C-score: 
#>   CoxPH      0.6103(95CI=0.5584-0.6862;SD=0.0495)
#>   SRF_Ensemble 0.6687(95CI=0.6094-0.7177;SD=0.0436)
#> AUCROC:
#>   CoxPH      0.6303(95CI=0.5669-0.7032;SD=0.054)
#>   SRF_Ensemble 0.6717(95CI=0.6006-0.7422;SD=0.0522)
summary(compare_models2)
#> 
#> Internally validated test performance of CoxPH     and Survival Random Forest ensemble:
#>                   T C_score AUCROC      BS BS_scaled Calib_slope Calib_alpha
#> CoxPH        8.0774  0.6103 0.6303  0.1667    0.0105      0.8752      0.2008
#> SRF_Ensemble 8.0774  0.6687 0.6717  0.1576    0.0640      0.7725      0.2045
#> Diff         0.0000  0.0584 0.0415 -0.0091    0.0535     -0.1026      0.0037
#> pvalue          NaN  0.0072 0.0107  0.3152    0.0109      0.8000      0.4706
#>               sec
#> CoxPH        0.24
#> SRF_Ensemble 8.17
#> Diff         7.93
#> pvalue        NaN
#> 
#> Survival Random Forest ensemble has outperformed CoxPH    by 0.0584 in C-index.
#> The difference is statistically significant with the p-value 0.0072**.
#> The supplied data may contain non-linear or cross-term dependencies, 
#> better captured by the Survival Random Forest.
#> C-score: 
#>   CoxPH      0.6103(95CI=0.5584-0.6862;SD=0.0495)
#>   SRF_Ensemble 0.6687(95CI=0.6094-0.7177;SD=0.0436)
#> AUCROC:
#>   CoxPH      0.6303(95CI=0.5669-0.7032;SD=0.054)
#>   SRF_Ensemble 0.6717(95CI=0.6006-0.7422;SD=0.0522)

Detailed results can be extracted from the output object. For example, test performance for each data split (across all cross-validations and repetitions).

# Test performance of the ensemble:
print(round(compare_models2$test$SRF_ensemble,4))
#>        T AUCROC     BS BS_scaled C_score Calib_slope Calib_alpha test
#> 1 8.0774 0.5931 0.1860    0.0007  0.6053      0.3219      0.3983    1
#> 2 8.0774 0.7473 0.1575    0.1583  0.7188      1.1482      0.4151    1
#> 3 8.0774 0.7065 0.1213    0.0818  0.7100      1.2563     -0.2601    1
#> 4 8.0774 0.6526 0.1390    0.0618  0.6831      0.6430      0.1879    1
#> 5 8.0774 0.6592 0.1847    0.0346  0.6386      0.6200      0.5382    1
#> 6 8.0774 0.6716 0.1570    0.0467  0.6562      0.6458     -0.0524    1

# Mean test performance of the Cox-PH and SRF ensemble:
print(round(compare_models2$results_mean,4))
#>                   T C_score AUCROC      BS BS_scaled Calib_slope Calib_alpha
#> CoxPH        8.0774  0.6103 0.6303  0.1667    0.0105      0.8752      0.2008
#> SRF_Ensemble 8.0774  0.6687 0.6717  0.1576    0.0640      0.7725      0.2045
#> Diff         0.0000  0.0584 0.0415 -0.0091    0.0535     -0.1026      0.0037
#> pvalue          NaN  0.0072 0.0107  0.3152    0.0109      0.8000      0.4706
#>               sec
#> CoxPH        0.24
#> SRF_Ensemble 8.17
#> Diff         7.93
#> pvalue        NaN

# Main stats 
 round(compare_models2$main_stats,4)
#>                        mean     sd 95CILow 95CIHigh
#> C_score_CoxPH        0.6103 0.0495  0.5584   0.6862
#> C_score_SRF_Ensemble 0.6687 0.0436  0.6094   0.7177
#> AUCROC_CoxPH         0.6303 0.0540  0.5669   0.7032
#> AUCROC_SRF_Ensemble  0.6717 0.0522  0.6006   0.7422

Example 3. Applying survcompare to GBSG data

Now, lets apply the package to a real life data. We will use GBSG data from the survival package (https://rdrr.io/cran/survival/man/gbsg.html).

library(survival)

#prepare the data
mygbsg <- gbsg
mygbsg$time <- gbsg$rfstime / 365
mygbsg$event <- gbsg$status
myfactors <-
  c("age", "meno", "size", "grade", "nodes", "pgr", "er", "hormon")
mygbsg <- mygbsg[c("time", "event", myfactors)]
sum(is.na(mygbsg[myfactors])) #check if any missing data

# run survcompare 
survcompare(mygbsg, myfactors, randomseed = 42, repeat_cv = 5)

# [1] "Cross-validating Cox PH model using 5 repeat(s), 3 outer, 3 inner loops)"
#   |===============================================================| 100%
# Time difference of 1.142473 secs
# [1] "Cross-validating CoxPH and SRF Ensemble using 5 repeat(s), 3 outer, 3 inner loops)"
#   |===============================================================| 100%
# Time difference of 49.34405 secs
# 
# Internally validated test performance of CoxPH     and Survival Random Forest ensemble:
#                   T C_score AUCROC      BS BS_scaled Calib_slope
# CoxPH        4.1797  0.6751 0.7203  0.2182    0.1489      1.2529
# SRF_Ensemble 4.1797  0.6938 0.7279  0.2162    0.1567      1.3961
# Diff         0.0000  0.0187 0.0076 -0.0020    0.0078      0.1432
# pvalue          NaN  0.0000 0.0744  0.4534    0.0919      0.0229
#              Calib_alpha   sec
# CoxPH             0.4193  1.14
# SRF_Ensemble      0.4186 49.34
# Diff             -0.0007 48.20
# pvalue            0.5410   NaN

We got the following results (depending on a random seed, it may slightly differ):

Survival Random Forest ensemble has outperformed CoxPH    by 0.0187 in C-index. The difference is statistically significant with the p-value 1.347e-05***. The supplied data may contain non-linear or cross-term dependencies, better captured by the Survival Random Forest.

C-score: 
  
  CoxPH      0.6751(95CI=0.6417-0.7137;SD=0.0231)
  
  SRF_Ensemble 0.6938(95CI=0.6596-0.7346;SD=0.0212)

AUCROC:
  
  CoxPH      0.7203(95CI=0.6693-0.7708;SD=0.0356)
  
  SRF_Ensemble 0.7279(95CI=0.6797-0.7781;SD=0.0311)
 
 See other items as x$item. Items available: results_mean, results_mean_train, return_models, test, train, difftest, main_stats, randomseed, useCoxLasso

This example illustrates a situation when outperformance of the non-linear model is statistically significant, but not large in absolute terms. The ultimate model choice will depend on how critical such an improvement is for the model stakeholders.