Train and evaluate machine learning models on previously constructed k folds

This function performs k-fold cross-validation using custom folds created from custom functions to be used for cohort-dependent algorithms (see vignette for more information about this). It supports hyperparameter tuning over a grid and returns a model object that mimicks the caret's training output, including performance metrics and predictions.

Usage

compute_custom_k_fold_CV(
  processed_folds,
  ml_method,
  tuneGrid = NULL,
  training_set_all
)

Arguments

processed_folds: A list of folds. Each fold contains processed training and test data with features.
ml_method: A character string indicating the machine learning model to use, as supported by the caret package (e.g., "rf", "svmRadial", "glmnet").
tuneGrid: Optional. A data frame specifying the grid of hyperparameters to evaluate. If NULL, a default grid of length 3 is generated using caret's getModelInfo().
training_set_all: A data frame containing the full training set (i.e., all folds combined) with features and a target column.

Value

A list (caret-style object) with the following components:

fit.train: The final model trained on the full training set using the best hyperparameters.
results: A data frame summarizing average cross-validated Accuracy, Kappa, and their standard deviations for each hyperparameter combination.
pred: A data frame of predictions from each fold, including class probabilities, observed and predicted labels, and hyperparameter values.
resample: A data frame summarizing Accuracy and Kappa per fold for the best-tuned model.

Details

This function performs the following:

Trains models for each fold and hyperparameter combination.
Predicts on the held-out test data of each fold.
Aggregates prediction results and evaluates Accuracy and Kappa for each fold and hyperparameter set.
Selects the best-performing hyperparameter set based on mean Accuracy across folds.
Trains the final model on the full dataset using the selected hyperparameters.