
Statistical Analysis
04-analysis.Rmd
library(CellTFusion)
#>
#> This article demonstrates how to test associations between
CellTFusion latent factors and clinical variables. The primary input is
res$Latent_spaces — the NMF latent factor object returned
by compute.latent_factors() or the
CellTFusion() wrapper.
Load example data:
raw.counts <- CellTFusion::raw.counts.tuto
traitdata <- CellTFusion::traitdata.tutoAssociation between latent factors and clinical traits
scores.stat.analysis() tests the association between
latent factor scores and a clinical variable. It accepts the full
latent_spaces object (not just $Z) as its
scores argument.
Supported method values: "fisher",
"wilcox", "anova", "kruskal",
"ttest".
# Run CellTFusion to obtain latent spaces
res <- CellTFusion(
raw.counts = raw.counts,
normalized = TRUE,
cancer_type = "skcm",
return = TRUE
)
# Wilcoxon rank-sum test (binary response variable)
sig_factors <- scores.stat.analysis(
scores = res$Latent_spaces,
coldata = traitdata,
trait = "Best.Confirmed.Overall.Response",
method = "wilcox",
pval = 0.05
)
# Fisher's exact test
sig_factors_fisher <- scores.stat.analysis(
scores = res$Latent_spaces,
coldata = traitdata,
trait = "Best.Confirmed.Overall.Response",
method = "fisher",
pval = 0.05
)
# ANOVA (multi-level categorical variable)
sig_factors_anova <- scores.stat.analysis(
scores = res$Latent_spaces,
coldata = traitdata,
trait = "Best.Confirmed.Overall.Response",
method = "anova",
pval = 0.05
)The function returns a list of significant factor results along with
visualisations (box plots, violin plots) saved to
Results/.
Direct access to factor scores
If you want to work with the factor scores directly (e.g., for custom
plots or downstream modelling), access the $Z matrix:
# samples x factors matrix
factor_scores <- data.frame(res$Latent_spaces$Z)
head(factor_scores)TF module — clinical trait association
To visualise associations between TF module eigengenes (rather than
latent factors) and all available clinical traits simultaneously, use
compute.metadata.association(). It runs Pearson correlation
for continuous traits and ANOVA for categorical traits, and saves a
labelled heatmap and violin plots to Results/:
compute.metadata.association(
tfs.modules = res$TF_network[[1]],
coldata = traitdata,
pval = 0.05,
file.name = "Tutorial",
width = 10
)TF modules — pathway correlation
To explore the relationship between TF module scores and pathway activities:
compute.modules.relationship(
matA = res$TF_network[[1]],
matB = res$Pathways_scores,
file_name = "Pathways_vs_TF_modules",
width = 15
)Survival analysis
compute.survival.analysis() tests whether Kaplan-Meier
survival curves differ significantly between groups, using the
survival and survminer packages. It supports
two modes:
-
Predefined groups (
group_column) — e.g. a clinical risk group, treatment arm, or a cluster/cell-group assignment already present incoldata. -
Automatic feature screening (
features) — e.g.res$Latent_spaces$Z, TF module scores, or cell group scores. Each feature/column is split into High/Low groups by a quantile cutoff (thres), and only features whose log-rank test is significant (p.value) are returned.
Both modes require traitdata to include a time-to-event
column (PFS) and an event indicator column
(PFS_event, 1 = event, 0 = censored).
# Mode 1: compare survival between predefined clinical groups
surv_group <- compute.survival.analysis(
survival.data = traitdata,
PFS = "PFS",
PFS_event = "PFS_event",
group_column = "Best.Confirmed.Overall.Response",
file_name = "Tutorial"
)
surv_group$p_value # log-rank test p-value
surv_group$median_PFS # median survival per group
surv_group$km_plot # survminer::ggsurvplot object
# Mode 2: screen latent factors for significant survival splits
surv_factors <- compute.survival.analysis(
survival.data = traitdata,
PFS = "PFS",
PFS_event = "PFS_event",
features = data.frame(res$Latent_spaces$Z),
p.value = 0.05,
thres = 0.5, # median split; use e.g. 0.75 for a top-quartile-vs-rest split
file_name = "Tutorial"
)For every significant result, a Kaplan-Meier plot with a risk table
is saved to
Results/SurvPlot_<group-or-feature>_<file_name>.svg.
This is the same function used for survival analysis in the
CellTFusion companion study (CellTFusion_paper), e.g. to
test whether NMF latent factor scores stratify patients by
overall/progression-free survival.