Statistical Analysis • CellTFusion

library(CellTFusion)
#> 
#>

This article demonstrates how to test associations between CellTFusion latent factors and clinical variables. The primary input is res$Latent_spaces — the NMF latent factor object returned by compute.latent_factors() or the CellTFusion() wrapper.

Load example data:

raw.counts <- CellTFusion::raw.counts.tuto
traitdata  <- CellTFusion::traitdata.tuto

Association between latent factors and clinical traits

scores.stat.analysis() tests the association between latent factor scores and a clinical variable. It accepts the full latent_spaces object (not just $Z) as its scores argument.

Supported method values: "fisher", "wilcox", "anova", "kruskal", "ttest".

# Run CellTFusion to obtain latent spaces
res <- CellTFusion(
  raw.counts  = raw.counts,
  normalized  = TRUE,
  cancer_type = "skcm",
  return      = TRUE
)

# Wilcoxon rank-sum test (binary response variable)
sig_factors <- scores.stat.analysis(
  scores  = res$Latent_spaces,
  coldata = traitdata,
  trait   = "Best.Confirmed.Overall.Response",
  method  = "wilcox",
  pval    = 0.05
)

# Fisher's exact test
sig_factors_fisher <- scores.stat.analysis(
  scores  = res$Latent_spaces,
  coldata = traitdata,
  trait   = "Best.Confirmed.Overall.Response",
  method  = "fisher",
  pval    = 0.05
)

# ANOVA (multi-level categorical variable)
sig_factors_anova <- scores.stat.analysis(
  scores  = res$Latent_spaces,
  coldata = traitdata,
  trait   = "Best.Confirmed.Overall.Response",
  method  = "anova",
  pval    = 0.05
)

The function returns a list of significant factor results along with visualisations (box plots, violin plots) saved to Results/.

Direct access to factor scores

If you want to work with the factor scores directly (e.g., for custom plots or downstream modelling), access the $Z matrix:

# samples x factors matrix
factor_scores <- data.frame(res$Latent_spaces$Z)
head(factor_scores)

TF module — clinical trait association

To visualise associations between TF module eigengenes (rather than latent factors) and all available clinical traits simultaneously, use compute.metadata.association(). It runs Pearson correlation for continuous traits and ANOVA for categorical traits, and saves a labelled heatmap and violin plots to Results/:

compute.metadata.association(
  tfs.modules = res$TF_network[[1]],
  coldata     = traitdata,
  pval        = 0.05,
  file.name   = "Tutorial",
  width       = 10
)

TF modules — pathway correlation

To explore the relationship between TF module scores and pathway activities:

compute.modules.relationship(
  matA      = res$TF_network[[1]],
  matB      = res$Pathways_scores,
  file_name = "Pathways_vs_TF_modules",
  width     = 15
)

Survival analysis

compute.survival.analysis() tests whether Kaplan-Meier survival curves differ significantly between groups, using the survival and survminer packages. It supports two modes:

Predefined groups (group_column) — e.g. a clinical risk group, treatment arm, or a cluster/cell-group assignment already present in coldata.
Automatic feature screening (features) — e.g. res$Latent_spaces$Z, TF module scores, or cell group scores. Each feature/column is split into High/Low groups by a quantile cutoff (thres), and only features whose log-rank test is significant (p.value) are returned.

Both modes require traitdata to include a time-to-event column (PFS) and an event indicator column (PFS_event, 1 = event, 0 = censored).

# Mode 1: compare survival between predefined clinical groups
surv_group <- compute.survival.analysis(
  survival.data = traitdata,
  PFS           = "PFS",
  PFS_event     = "PFS_event",
  group_column  = "Best.Confirmed.Overall.Response",
  file_name     = "Tutorial"
)

surv_group$p_value        # log-rank test p-value
surv_group$median_PFS      # median survival per group
surv_group$km_plot          # survminer::ggsurvplot object

# Mode 2: screen latent factors for significant survival splits
surv_factors <- compute.survival.analysis(
  survival.data = traitdata,
  PFS           = "PFS",
  PFS_event     = "PFS_event",
  features      = data.frame(res$Latent_spaces$Z),
  p.value       = 0.05,
  thres         = 0.5,        # median split; use e.g. 0.75 for a top-quartile-vs-rest split
  file_name     = "Tutorial"
)

For every significant result, a Kaplan-Meier plot with a risk table is saved to Results/SurvPlot_<group-or-feature>_<file_name>.svg.

This is the same function used for survival analysis in the CellTFusion companion study (CellTFusion_paper), e.g. to test whether NMF latent factor scores stratify patients by overall/progression-free survival.