Single-cell metacell construction
If single-cell data is available, we recommend generating metacells to reduce computation time and prevent session crashes. Deconvolution methods that rely on single-cell data can be computationally intensive, especially with large matrices. We suggest using a maximum of 20k cells; if your object exceeds this size, creating metacells is strongly advised. However, if your computational resources are sufficient to handle the full single-cell dataset, you may skip this step.
We adapted functions from the R package hdWGCNA (Morabito et al. (2023); Langfelder and Horvath (2008)) for the construction of metacells using the KNN algorithm.
sc_object: Normalized gene expression matrix with genes as rows and cells as columns
labels_column: Vector of cell annotations
samples_column: Vector of sample IDs for each cell
exclude_cells: Vector specifying which cell types to ignore during metacell construction (default is NULL)
min_cells: Minimum number of cells required to construct metacells in a group
k: Number of nearest neighbors used for the KNN algorithm
max_shared: Maximum number of cells shared between two metacells
n_workers: Number of cores to use for parallelizing metacell construction
min_meta: Minimum number of metacells required for a cell type to be retained
Because of space limitations, we have not included a complete
single-cell object in this tutorial. However, users are expected to
provide their own single-cell data and supply it to the
sc_object parameter in the function call.
metacells = create_metacells(sc_object,
labels_column = cell_labels,
samples_column = sample_labels,
exclude_cells = NULL,
min_cells = 50,
k = 15,
max_shared = 15,
n_workers = 4,
min_meta = 10)Second-generation deconvolution methods
Once the single-cell data is prepared, users can supplement the
default deconvolution methods with second-generation approaches such as
AutogeneS, BayesPrism, Bisque,
CPM, MuSic, and SCDC. These
methods learn cell-type signatures directly from annotated single-cell
RNA-seq data, rather than relying on predefined static signatures (Dietrich et al. (2024)), to deconvolve
bulk RNA-seq profiles.
sc_deconv: Boolean indicating whether to run second-generation methods
sc_matrix: Normalized single-cell gene expression matrix
sc_metadata: Dataframe containing single-cell metadata
cell_annotations: Vector of cell type labels
cell_samples: Vector of sample IDs
name_sc_signature: Name to assign to the resulting signature
metacell_obj = multideconv::metacells_data
metacell_metadata = multideconv::metacells_metadata
head(metacell_obj[1:5,1:5])
#> B cells#Sample_15002_393 CD4+ T cells#Sample_14958_193
#> AL627309.1 0 0
#> AL732372.1 0 0
#> AC114498.1 0 0
#> FAM87B 0 0
#> LINC00115 0 0
#> Plasma B cells#Sample_15467_117 CD4+ T cells#Sample_12889_54
#> AL627309.1 0.00000000 0.00000000
#> AL732372.1 0.00000000 0.00000000
#> AC114498.1 0.00000000 0.00000000
#> FAM87B 0.00000000 0.00000000
#> LINC00115 0.01454298 0.05538857
#> B cells#Sample_11817_97
#> AL627309.1 0
#> AL732372.1 0
#> AC114498.1 0
#> FAM87B 0
#> LINC00115 0
head(metacell_metadata)
#> orig.ident nCount_RNA nFeature_RNA
#> B cells#Sample_15002_393 B cells#Sample 2085.014 7927
#> CD4+ T cells#Sample_14958_193 CD4+ T cells#Sample 1699.248 6390
#> Plasma B cells#Sample_15467_117 Plasma B cells#Sample 1397.391 9314
#> CD4+ T cells#Sample_12889_54 CD4+ T cells#Sample 1798.458 7164
#> B cells#Sample_11817_97 B cells#Sample 1944.183 7765
#> CD8+ T cells#Sample_11522_665 CD8+ T cells#Sample 2317.216 7862
#> cells_merged
#> B cells#Sample_15002_393 CTGTTTATCGCCATAA-1,TGCTGCTTCGGAGCAA-1,ACTTTCATCGACAGCC-1,GCTGCAGTCATTGCGA-1,CGTAGGCGTGGCTCCA-1,TCAGCTCAGACTAAGT-1,CACCAGGCACCAGCAC-1,CATCAAGTCAGCGATT-1,CGATGGCTCGTTACGA-1,CAGTCCTGTGTGAATA-1,CTTGGCTCAAGTCATC-1,CCAATCCAGCCACGTC-1,TTTGGTTCAGATGGCA-1,TGGTTAGAGAGTAATC-1,TAGTTGGGTTATGTGC-1,GACACGCGTCGAGTTT-1,AGGGATGGTACACCGC-1,ACTTACTTCGAGAACG-1,GGCAATTCATTAGGCT-1,GAAACTCAGACTGGGT-1,ATTGGTGGTAGCTTGT-1,TTGAACGAGACGCAAC-1,TGTTCCGGTACTTAGC-1,CAGCTGGCAGGTGGAT-1,GGACATTAGCCCAATT-1,TAAGCGTCAGCCTGTG-1,CTCTAATAGATCTGAA-1,CACACCTAGCCTTGAT-1,CTTACCGAGCGCTCCA-1,CACTCCAGTCGACTGC-1
#> CD4+ T cells#Sample_14958_193 AAGGAGCTCCAAATGC-1,ACACCCTTCTGCGTAA-1,CGGACGTGTAGGCTGA-1,CGACCTTCAAGTCATC-1,TTAGGCACAGGAACGT-1,GTATTCTAGTGTGAAT-1,TCAATCTCAAGAAAGG-1,GGCAATTTCTTGTTTG-1,GCGCAACGTCTAGCCG-1,CCTTACGCATCGACGC-1,TCCCGATAGACAATAC-1,ACAGCTAGTTCCACTC-1,TGAGAGGAGATCTGAA-1,GCGCCAAAGCCACCTG-1,TAAGTGCGTCTCACCT-1,ATGCGATTCGTTACGA-1,CAAGTTGAGCTACCGC-1,TACCTTACATTCTTAC-1,GCATGTATCAGTCAGT-1,GATGCTACATATGGTC-1,CCATGTCTCTGGGCCA-1,TACACGAGTCATTAGC-1,GATTCAGAGCACACAG-1,TTGCCGTAGCAATATG-1,TCGTACCAGCCAACAG-1,CAGTCCTTCGCCAGCA-1,CTACATTCAGTCGATT-1,AGGGTGATCTTGACGA-1,GTCTCGTGTCTCTTAT-1,TCGGGACAGCACACAG-1-1
#> Plasma B cells#Sample_15467_117 GCGCAACAGTACATGA-1,GTCATTTCACGCGAAA-1,ACGATACTCATAAAGG-1,CCCATACCACCAACCG-1,AGCCTAAAGTGTTAGA-1,TGTATTCCAGGGTACA-1,CATATTCTCAACGGCC-1,CACACCTGTCCAACTA-1,CCGTACTTCCGAGCCA-1,GGGACCTTCCAGAAGG-1,ACACCCTGTGTAACGG-1,AGCCTAACAGAAGCAC-1,GCTTCCATCCATGCTC-1,AGTGTCAGTCCTGCTT-1,CGAGCACAGAAACGCC-1,GAACGGATCGGGAGTA-1,TGCCAAACACTCAGGC-1,TACAGTGCAGTGGGAT-1,TATGCCCCAGTATCTG-1,GTCGGGTGTAAGAGGA-1,AAGGCAGCATCTATGG-1,TCGAGGCGTAGCTCCG-1,CTGCTGTTCGCGCCAA-1,CGGACACAGACTACAA-1,CGTGTAAAGCCCAATT-1,CTTTGCGAGGAGTCTG-1,TGCCCTAAGCCAACAG-1,CTCAGAACACCTCGGA-1,CTGCCTAGTGTCTGAT-1,ACACCAATCGACGGAA-1
#> CD4+ T cells#Sample_12889_54 GCGAGAATCGCACTCT-1,GTTACAGTCTTCGAGA-1,CAGTAACGTAACGACG-1,TCGCGTTCAATCCGAT-1,CACCTTGGTGTCGCTG-1,ATTACTCCAAACAACA-1,TACAGTGTCAGTGCAT-1,ACGCAGCGTCTGCCAG-1,TTGTAGGAGCCACGTC-1,CTTGGCTGTTCGTGAT-1,CCACTACAGACACTAA-1,CCCAGTTTCAGGCCCA-1,GAACGGAAGGCACATG-1,CGTAGCGAGAGGGCTT-1,ATTCTACTCGGTTAAC-1,CATGACAGTGGTCCGT-1,TCAGCAAGTGCTTCTC-1,AGCTTGAGTCCTCCAT-1,CATTCGCCATCGATTG-1,TCAGGATAGCACAGGT-1,CCCTCCTTCTGCGTAA-1,AAATGCCTCCTTGACC-1,TACTCGCTCCTATTCA-1,GAAATGATCGCATGGC-1,ACTGCTCTCGGTCTAA-1,AACGTTGTCATAGCAC-1,GTGTGCGGTCTAGGTT-1,GCACTCTCAAGCCGTC-1,CCGTTCACATCCTTGC-1,CCACTACGTCGAGTTT-1
#> B cells#Sample_11817_97 AAACCTGGTACCGGCT-1,ACAGCTAGTAAGTAGT-1,ACGATACAGTACGACG-1,TAAACCGAGGAGTACC-1,GGACGTCGTCTGCAAT-1,TAGCCGGAGGGCATGT-1,CGGACACCACATCCAA-1,GGTATTGTCACTTCAT-1,GGAAAGCAGTACGATA-1,AAACGGGTCGTGGGAA-1,TTTCCTCCAGGCGATA-1,GTCATTTAGGAACTGC-1,CAACCAAGTCCGTTAA-1-1,GTCGTAAGTACAGTGG-1,GTTTCTAAGCCCAATT-1,GACCAATCATTCCTCG-1,CAACCAAGTTCTGAAC-1,AGATCTGAGCAGGTCA-1,GTTCGGGCAGTTCCCT-1,GCAAACTCATGAACCT-1,CTCTGGTGTGATGATA-1,TTATGCTCAAACTGTC-1,GACGTTAGTACCGTTA-1,GGGACCTGTTTGACAC-1,ACAGCCGTCTGCGGCA-1,GGCGTGTTCTAAGCCA-1,CGGACACGTGCTTCTC-1-1,GGTGAAGAGCTCAACT-1,GCTTGAACATCCGTGG-1,GGGAGATAGGGCACTA-1
#> CD8+ T cells#Sample_11522_665 ACACCGGAGCCAGTAG-1-1,GAAATGATCCCACTTG-1,GAGTCCGAGTACGCGA-1,CTGAAACAGTCGCCGT-1,GTCGGGTGTGGTGTAG-1,ACGATGTCAGACGCTC-1,ATCATCTGTCTGCAAT-1,TGCTGCTAGACACTAA-1,TACTTGTTCGCAAGCC-1,CGAGCACAGTTAGGTA-1,GCTTGAATCTACTCAT-1,ATCATCTCAAGACGTG-1,GTGTTAGGTCAACATC-1,CGGGTCAGTAAGCACG-1,GAACCTAAGAGTGAGA-1,TCATTACGTGCAACGA-1,GACGTGCGTATGCTTG-1,TCGTACCTCATGCTCC-1,TGCGCAGTCGCCAGCA-1,CACACTCGTAGCAAAT-1,GATGAAAGTATCAGTC-1,TCATTTGAGGCTATCT-1,CTAACTTTCAACTCTT-1,CTTGGCTCAAAGAATC-1,GACTACATCTGACCTC-1,TTTCCTCTCAACGAAA-1,AAAGCAAGTCAGAAGC-1,CATCAAGTCCGAATGT-1,CGATGGCTCGCAAACT-1,CGAGAAGTCAGTGTTG-1
#> annotated_ct sample
#> B cells#Sample_15002_393 B cells Sample_15002
#> CD4+ T cells#Sample_14958_193 CD4+ T cells Sample_14958
#> Plasma B cells#Sample_15467_117 Plasma B cells Sample_15467
#> CD4+ T cells#Sample_12889_54 CD4+ T cells Sample_12889
#> B cells#Sample_11817_97 B cells Sample_11817
#> CD8+ T cells#Sample_11522_665 CD8+ T cells Sample_11522This function computes cell type deconvolution using the six default
methods (quanTIseq, DeconRNASeq,
EpiDISH, DWLS, MOMF) and
CIBERSORTx (if credentials are provided), along with
second-generation deconvolution approaches. The output includes all
combinations of methods and signatures.
deconv = compute.deconvolution(raw.counts = bulk,
normalized = TRUE,
return = TRUE,
methods = c("Quantiseq", "Epidish", "DeconRNASeq"),
file_name = "Tutorial",
sc_deconv = TRUE,
sc_matrix = metacell_obj,
sc_metadata = metacell_metadata,
methods_sc = c("Autogenes", "BayesPrism",
"Bisque", "CPM", "MuSic", "SCDC"),
cell_label = "annotated_ct",
sample_label = "sample",
name_sc_signature = "Test")To run only the second-generation deconvolution methods based on
single-cell data, without using any static cell-type signatures, use the
public compute.deconvolution() workflow with an empty
first-generation methods vector:
deconv_sc = compute.deconvolution(raw_counts = bulk,
normalized = TRUE,
methods = character(0),
sc_deconv = TRUE,
sc_matrix = metacell_obj,
sc_metadata = metacell_metadata,
methods_sc = c("Autogenes", "BayesPrism",
"Bisque", "CPM", "MuSic", "SCDC"),
cell_label = "annotated_ct",
sample_label = "sample",
name_sc_signature = "Test",
workers = 2,
return = TRUE,
file_name = "Tutorial")