Nested Cross-Validation for Survival Models with Optional Custom Fold Construction

Performs nested cross-validation to evaluate and tune multiple survival models using the tidymodels ecosystem. Supports both standard event-stratified cross-validation and Leave-One-Domain-Out (LODO) setups, enabling cohort-balanced model evaluation. Hyperparameter grids are automatically constructed for each model type.

Usage

compute_k_fold_CV_survival(
  df_features,
  df_outcome,
  outcome_col,
  event_col,
  k_folds,
  n_rep,
  ncores,
  LODO = FALSE,
  batch_id = NULL,
  file_name = NULL,
  fold_construction_fun = NULL,
  fold_construction_args_fixed = NULL,
  fold_construction_args_tunable = NULL
)

Arguments

df_features: A data frame of predictor variables (features).
df_outcome: A data frame containing survival outcomes — typically including survival time and event indicator columns.
outcome_col: Character string giving the name of the survival time column.
event_col: Character string giving the name of the event indicator column (0 = censored, 1 = event).
k_folds: Integer. Number of folds for K-fold cross-validation (default = 5).
n_rep: Integer. Number of repeated CV iterations (default = 1).
ncores: Integer. Number of CPU cores to use for parallelization.
LODO: Logical; if TRUE, performs Leave-One-Domain-Out cross-validation using batch_id to stratify samples by cohort.
batch_id: Optional character string naming the column representing cohort or batch identifiers. Required if LODO = TRUE.
file_name: Optional string specifying the suffix for the generated C-index summary PDF saved in the "Results/" directory.
fold_construction_fun: Optional custom function for constructing data folds. Used to interface with external preprocessing workflows (e.g., CellTFusion).
fold_construction_args_fixed: Optional list of fixed arguments passed to fold_construction_fun().
fold_construction_args_tunable: Optional list of tunable arguments passed to fold_construction_fun() during hyperparameter tuning.

Value

A named list with the following elements:

Model: The best-performing survival model retrained on the full dataset.
ML_Models: All evaluated survival models with aggregated C-index results.
C_index_median: Median C-index of the top-performing model.
Custom_output: Optional list of custom outputs from fold construction.

Details

Depending on the inputs, the function can:

Build folds internally or accept custom folds from a user-defined function.
Train survival models with or without hyperparameter tuning.
Compute and aggregate the Concordance Index (C-index) across folds.
Identify and retrain the top-performing model using optimal parameters.

Internally, the function:

Merges predictors and outcomes into a single dataset.
Creates stratified folds using rsample, either by event rate or by cohort × event combinations (if LODO = TRUE).
Evaluates a predefined set of survival models: Cox PH, penalized Cox (glmnet), AFT (flexsurv), decision trees, bagged trees, and random forests.
Aggregates the median and MAD of the C-index across resamples.
Retrains the best-performing model with its optimal hyperparameters.

When a custom fold construction function is provided via fold_construction_fun, the function handles folds in parallel, saves intermediate results under "Results/", and returns additional outputs for advanced integration.