Listing 2.3: Export to SummarizedExperiment format.
Code
SE_object<-DGEList2SE(DRB_count)# check that both expression matrices store the same contentstopifnot(identical(DRB_count$counts, SummarizedExperiment::assay(SE_object)))saveRDS(SE_object, paste0("./data/processed/SummarizedExperiment_before_filtering_", today_date, ".rds"))rm(barcode_counts_aggregated, counts, pheno_data, barcode_files)
After filtering for background noise, and removing samples with a total of 0 barcodes,4643 unique barcode IDs are kept for 574 unique biological replicates.
Using edgeR::cpm normalisation, with \(\log\) factor transformation:
Library sizes are normalized using effective library sizes, which are the raw library totals multiplied by normalization factors usually estimated by TMM. More information available on edgeR::calcNormFactors(). The scaling normally ensures that fair comparison of expression values across samples with different sequencing depths and composition. Seems requiring first call to edgeR::calcNormFactors() to make it work!!!
prior.count adds a small offset before \(log\)-transformation to avoid -Inf values.
Besides, we \(log_2\) normalise the values such that they better fit Gaussian distributions.
Binarise Fold-changes (only those above a pre-determined threshold) per replicate:
Listing 2.6: Compute the averaged value for control replicates, then binarise for each compound. A 1 denoting a significant positive enrichment for a given barcode_id.
Code
DRB_binarised_replicates<-differential_binarisation(DRB_CPM_normalised_replicates, group ="Batch_ID", ref_level ="Control", threshold_FC =3)saveRDS(DRB_binarised_replicates, file =paste0("./results/differential_analyses/DRB_binarised_replicate_",today_date, ".rds"))
We removed 60 control samples (all remaining samples being associated with an effective drug).
# Use an all function as aggregator, but could have been a mean or sum as well.DRB_binarised_samples<-collapseSamples(DRB_binarised_replicates, group ="Unique_compound", showReps =FALSE, sample_colname ="Replicates_ID", method =all)null_barcodes<-which(rowSums(DRB_binarised_samples$counts)==0)if(length(null_barcodes)>0L){message(paste("We removed a total of", length(null_barcodes),"over a total of", nrow(DRB_binarised_samples), "barcode IDs. None of them being fully differential in at least one compound."))DRB_binarised_samples<-DRB_binarised_samples[-null_barcodes,]}saveRDS(DRB_binarised_samples, file =paste0("./results/differential_analyses/DRB_binarised_samples_",today_date, ".rds"))
2.5.2logCPM differential analysis
logCPM DRB approach considers the following steps:
Compute averaged expression per compound and batch.
Compute the log fold-change (ratio on the original space, difference after \(\log_2\) transformation) fo each compound with respect to Controls.
Remove barcodes not expressing a significant barcode change (\(|\log_2 (FC)| < 1\)) in at least one compoundNone of the barcodes out not removing any.
Code
DRB_logCPM_normalised_samples<-differential_barcoded_logCPM(DRB_CPM_normalised_replicates, group ="Batch_ID", ref_level ="Control", threshold_LFC =log(3))saveRDS(DRB_logCPM_normalised_samples, file =paste0("./results/differential_analyses/DRB_logCPM_samples_",today_date, ".rds"))
2.5.3DeSeq2 differential analysis + shrinkage
We removed batches “exp281022_time_course”, “exp200921_dose_response”, “exp130921_in_vivo”, as we will use different experimental designs for these batches:
Code
selected_replicates_deseq2<-DRB_filtered_replicates$samples|># Remove unwanted batches, and compoundsdplyr::filter(!(Batch_ID%in%c("exp281022_time_course", "exp200921_dose_response", "exp130921_in_vivo")|Compound%in%c("Control - time zero", "P42", "P43")))|>row.names()DRB_filtered_replicates_Deseq2<-DRB_filtered_replicates[, selected_replicates_deseq2]
In Table 2.1, we set apart three categories of compounds:
For each compound available at different concentrations, we select the experiment exhibiting highest concentrations, and discard others, see Table 2.1 (b).
We identify compounds differing only by the batch effect (same concentration, same treatment duration, but different batch), reported in Table 2.1 (c) -> two-fold objective: evaluate the magnitude of the batch effect with respect to the treatment effect + merge treatment effects across several batches, borrowing ideas from meta-analysis.
Code
rm(DRB_count, DRB_filtered_replicates, selected_replicates_deseq2)# compounds in several batches, possibly with #conditions ----compounds_with_distinct_concentrations<-DRB_filtered_replicates_Deseq2$samples|>dplyr::select(-Replicates_ID)|>dplyr::select(Batch_ID, Pathway, MoA, Compound, Unique_compound, Concentrations_Formatted, Concentrations)|>dplyr::distinct()|>dplyr::group_by(Compound)|>dplyr::mutate(n_batches =dplyr::n())|>dplyr::filter(n_batches>1)|>tidyr::chop(cols =c(Batch_ID, Unique_compound))|>dplyr::arrange(Compound)# first selection, for each compound with # concentrations, select one with highest concentrationcompounds_selected<-compounds_with_distinct_concentrations|>dplyr::slice_max(order_by =Concentrations, n =1, na_rm =TRUE)compounds_with_distinct_concentrations_removed<-compounds_with_distinct_concentrations|>dplyr::anti_join(y =compounds_selected, by =join_by(Compound, Concentrations_Formatted, Concentrations, n_batches, Batch_ID, Unique_compound))samples_with_distinct_concentrations_removed<-compounds_with_distinct_concentrations_removed|>tidyr::unchop(cols =c(Unique_compound))|>dplyr::pull(Unique_compound)|>unique()replicates_with_distinct_concentrations_removed<-DRB_filtered_replicates_Deseq2$samples|>dplyr::filter(Unique_compound%in%c(samples_with_distinct_concentrations_removed))|>dplyr::pull(Replicates_ID)DRB_filtered_replicates_Deseq2<-DRB_filtered_replicates_Deseq2[, setdiff(colnames(DRB_filtered_replicates_Deseq2), replicates_with_distinct_concentrations_removed)]# set apart replicates present in # batches from those available in only one batch ----pheno_data_DESeq2_several_batches<-compounds_selected|>dplyr::filter(lengths(Batch_ID)>1L)|>dplyr::filter(Compound!="Control")|>tidyr::unchop(cols =c(Batch_ID, Unique_compound))row.names(pheno_data_DESeq2_several_batches)<-paste(pheno_data_DESeq2_several_batches$Batch_ID,pheno_data_DESeq2_several_batches$Compound, sep ="_")# Table visualisations ----flextable::flextable(compounds_with_distinct_concentrations)|>flextable::bold(part ="header")flextable::flextable(compounds_with_distinct_concentrations_removed)|>flextable::bold(part ="header")flextable::flextable(pheno_data_DESeq2_several_batches)|>flextable::bold(part ="header")
Pathway
MoA
Compound
Concentrations_Formatted
Concentrations
n_batches
Batch_ID
Unique_compound
Proteasome
Proteasome inhibitor
Bortezomib
20 nM
0.00000002
2
[[character]]
[[character]]
RAS-MAPK
RAF inhibitor
CCT196969
1 uM
0.00000100
2
[[character]]
[[character]]
Chemotherapy
Alkylating agent
Cisplatin
500 nM
0.00000050
2
[[character]]
[[character]]
Chemotherapy
Alkylating agent
Cisplatin
250 nM
0.00000025
2
[[character]]
[[character]]
Control
Control
Control
000 uM
0.00000000
11
[[character]]
[[character]]
RTK
EGFR inhibitor (1st gen.)
Gefitinib
2 uM
0.00000200
3
[[character]]
[[character]]
RTK
PROTAC conjugated EGFR inhibitor
Gefitinib-PROTAC-1
10 uM
0.00001000
2
[[character]]
[[character]]
RAS-MAPK
RAF inhibitor
LY3009120
3 uM
0.00000300
2
[[character]]
[[character]]
RTK
EGFR inhibitor (3rd gen.)
Osimertinb
100 nM
0.00000010
3
[[character]]
[[character]]
Chemotherapy
Antifolate
Pemetrexed
100 nM
0.00000010
3
[[character]]
[[character]]
Chemotherapy
Antifolate
Pemetrexed
80 nM
0.00000008
3
[[character]]
[[character]]
RTK
Inhibitor of RAF, VEGFR2, VEGFR3, PDGFRβ, FLT3 and c-Kit
Sorafenib
5 uM
0.00000500
2
[[character]]
[[character]]
RAS-MAPK
MEK inhibitor
Trametinib
30 nM
0.00000003
3
[[character]]
[[character]]
RAS-MAPK
MEK inhibitor
Trametinib
40 nM
0.00000004
3
[[character]]
[[character]]
Novel compound
unknown
X13271
5 uM
0.00000500
5
[[character]]
[[character]]
Novel compound
unknown
X13271
8 uM
0.00000800
5
[[character]]
[[character]]
Novel compound
unknown
X13271
9 uM
0.00000900
5
[[character]]
[[character]]
Novel compound
unknown
X17129
8 uM
0.00000800
2
[[character]]
[[character]]
(a) Compounds present in different batches, and or different concentrations
Pathway
MoA
Compound
Concentrations_Formatted
Concentrations
n_batches
Batch_ID
Unique_compound
Chemotherapy
Alkylating agent
Cisplatin
250 nM
0.00000025
2
[[character]]
[[character]]
Chemotherapy
Antifolate
Pemetrexed
80 nM
0.00000008
3
[[character]]
[[character]]
RAS-MAPK
MEK inhibitor
Trametinib
30 nM
0.00000003
3
[[character]]
[[character]]
Novel compound
unknown
X13271
5 uM
0.00000500
5
[[character]]
[[character]]
Novel compound
unknown
X13271
8 uM
0.00000800
5
[[character]]
[[character]]
(b) Compounds discarded: when distinct concentrations were avalaible, take one with highest concentration.
We aggregate values for samples present in different batches, using the metafor::rma function, and assuming a fixed-effects meta-model (rather than a random-effects model, congruent to the small number of batches to evaluate the heterogeneity):
Build a SummarizedExperiment object, merging log-fold changes resulting from meta-analysis with log fold-changes of compounds available in only one batch:
Code
# Prepare the phenotype dataset with one value per sampleDRB_Deseq2_compound_pheno_data<-SummarizedExperiment::colData(DRB_Deseq2_batch_by_compound_SE)|>as.data.frame()|>dplyr::select(Pathway, MoA, Compound)|>dplyr::distinct()unique_compounds_IDs<-DRB_Deseq2_compound_pheno_data|>dplyr::anti_join(DRB_aggregated_DESeq2_several_batches, by ="Compound")|>row.names()row.names(DRB_Deseq2_compound_pheno_data)<-DRB_Deseq2_compound_pheno_data$Compound# Merge barcode fold changes (meta+sample-unique)DRB_Deseq2_compound_counts_meta<-DRB_aggregated_DESeq2_several_batches|>tidyr::pivot_wider( id_cols =barcode_ID, names_from =c(Compound), values_from =ES_combined)|>tibble::column_to_rownames("barcode_ID")|>as.matrix()DRB_Deseq2_batch_by_compound_SE_unique<-DRB_Deseq2_batch_by_compound_SE[, unique_compounds_IDs]colnames(DRB_Deseq2_batch_by_compound_SE_unique)<-DRB_Deseq2_batch_by_compound_SE_unique$CompoundDRB_Deseq2_compound_counts<-cbind(DRB_Deseq2_compound_counts_meta,DRB_Deseq2_batch_by_compound_SE_unique|>SummarizedExperiment::assay())# Generate the SummarizedExperiment objectDRB_Deseq2_compound_SE<-SummarizedExperiment::SummarizedExperiment(assays =DRB_Deseq2_compound_counts[, row.names(DRB_Deseq2_compound_pheno_data)], colData =DRB_Deseq2_compound_pheno_data)saveRDS(DRB_Deseq2_compound_SE, file =paste0("./results/differential_analyses/DRB_DESeq2_samples_", today_date, ".rds"))
in other words, a barcode must be differential in all replicates of a given compound, to be consider DRB↩︎
In other words, we assume the mean-variance relationship is specific to each barcode, applying “local” regression↩︎