Skip to contents

This function trains one or more machine learning models using repeated k-fold cross-validation, with optional model stacking and feature selection using Boruta. It supports stratified cross-validation, including the construction of k-folds stratified by cohorts when this information is available.

Usage

compute_features.training.ML(
  features_train,
  target_var,
  trait.positive,
  metric = "Accuracy",
  stack,
  k_folds = 10,
  n_rep = 5,
  LODO = FALSE,
  batch_id = NULL,
  file_name = NULL,
  ncores = NULL,
  return = FALSE,
  fold_construction_fun = NULL,
  fold_construction_args_fixed = NULL,
  fold_construction_args_tunable = NULL
)

Arguments

features_train

A data frame containing the features used for training (samples should be as rows).

target_var

A vector containing the target variable to predict.

trait.positive

Value in target_var to be considered as the positive class.

metric

Character. Metric used for hyperparameter tuning and model selection. Supported values are "Accuracy", "AUROC", and "AUPRC".

stack

Logical. Whether to perform model stacking. Default is FALSE.

k_folds

Integer. Number of folds to use in cross-validation.

n_rep

Integer. Number of repetitions of the cross-validation.

LODO

Logical. If TRUE, constructs folds stratified by cohorts (Leave-One-Dataset-Out CV).

batch_id

A vector indicating the cohort or batch for each sample (required only if LODO = TRUE).

file_name

Character. File name used to save plots in the Results/ directory.

ncores

Integer. Number of cores to use for parallelization. If not given, detectCores() - 1 will be used.

return

Logical. Whether to return and save the plots generated by the function.

fold_construction_fun

Function. A custom function used to construct the cross-validation folds. This function must accept a bestune argument, which is used internally to inject optimized parameters after hyperparameter tuning. If bestune = NULL, the function will explore a parameter grid across folds (parallelized with foreach); if bestune is provided, the optimized parameters will be applied to rebuild the features on the full training data.

fold_construction_args_fixed

List. A list of arguments passed to fold_construction_fun that remain fixed during both cross-validation and final training.

fold_construction_args_tunable

List. A list of arguments passed to fold_construction_fun that define the hyperparameters to be tuned during cross-validation. Each element should contain candidate values for tuning.

Value

A list containing:

  • Trained model (or meta-learner if stack = TRUE)

  • Features used in model training (all features if feature.selection = FALSE)