
Train and evaluate machine learning models on previously constructed k folds
Source:R/machine_learning.R
compute_custom_k_fold_CV.Rd
This function performs k-fold cross-validation using custom folds created from custom functions to be used for cohort-dependent algorithms (see vignette for more information about this). It supports hyperparameter tuning over a grid and returns a model object that mimicks the caret's training output, including performance metrics and predictions.
Arguments
- processed_folds
A list of folds. Each fold contains processed training and test data with features.
- ml_method
A character string indicating the machine learning model to use, as supported by the
caret
package (e.g.,"rf"
,"svmRadial"
,"glmnet"
).- tuneGrid
Optional. A data frame specifying the grid of hyperparameters to evaluate. If
NULL
, a default grid of length 3 is generated using caret'sgetModelInfo()
.- training_set_all
A data frame containing the full training set (i.e., all folds combined) with features and a
target
column.
Value
A list (caret-style object) with the following components:
fit.train
: The final model trained on the full training set using the best hyperparameters.results
: A data frame summarizing average cross-validated Accuracy, Kappa, and their standard deviations for each hyperparameter combination.pred
: A data frame of predictions from each fold, including class probabilities, observed and predicted labels, and hyperparameter values.resample
: A data frame summarizing Accuracy and Kappa per fold for the best-tuned model.
Details
This function performs the following:
Trains models for each fold and hyperparameter combination.
Predicts on the held-out test data of each fold.
Aggregates prediction results and evaluates Accuracy and Kappa for each fold and hyperparameter set.
Selects the best-performing hyperparameter set based on mean Accuracy across folds.
Trains the final model on the full dataset using the selected hyperparameters.