Skip to contents

This function performs repeated stratified k-fold cross-validation on a dataset to train and tune hyperparameters for 13 machine learning methods. Optionally, it can also perform model stacking and Boruta-based feature selection. Performance is evaluated using user-specified metrics such as Accuracy, AUROC, or AUPRC.

Usage

compute_k_fold_CV(
  model,
  k_folds,
  n_rep,
  stacking = FALSE,
  metric = "Accuracy",
  boruta,
  boruta_iterations = NULL,
  fix_boruta = NULL,
  tentative = FALSE,
  boruta_threshold = NULL,
  file_name = NULL,
  LODO = FALSE,
  ncores = NULL,
  return = FALSE,
  fold_construction_fun = NULL,
  fold_construction_args = list()
)

Arguments

model

A data frame containing features and a target column named 'target' corresponding to the response variable to predict.

k_folds

Integer. Number of folds for k-fold cross-validation. Default is 5.

n_rep

Integer. Number of repetitions of the k-fold cross-validation. Default is 100.

stacking

Logical. Whether to perform model stacking. Default is FALSE.

metric

Character. Metric used for hyperparameter tuning and model evaluation. Supported values are "Accuracy", "AUROC", and "AUPRC".

boruta

Logical. Whether to apply Boruta for feature selection before model training. Note that many ML models handle feature importance internally, so prior selection is optional unless multicollinearity is a concern. Default is FALSE.

boruta_iterations

Integer. Number of iterations to run Boruta. Since Boruta involves randomness, repeated runs improve consistency. Default is 100.

fix_boruta

Logical. Whether to fix Boruta’s internal parameters. See compute_boruta() for details.

tentative

Logical. Whether to include tentative features as confirmed in the training dataset.

boruta_threshold

Numeric. Threshold for confirming features after multiple Boruta iterations. For example, 0.8 means features must be confirmed in at least 80% of iterations. Default is 0.8.

file_name

Character. File name used for saving output plots in the Results/ directory.

LODO

Logical. If TRUE, performs Leave-One-Dataset-Out (LODO) cross-validation by stratifying folds based on cohort membership.

ncores

Integer. Number of cores to use for parallelization. If not given, detectCores() - 1 will be used.

return

Logical. Whether to return the results and generated plots.

fold_construction_fun

Function. A custom function used to construct the cross-validation folds. It should return a list of training indices for each fold.

fold_construction_args

List. Named list of additional arguments to pass to fold_construction_fun.

Value

A list containing:

  • Features used during training

  • The selected machine learning model

  • All trained machine learning models

If stacking = TRUE, the list will also include: