Listing 2.3: Export to SummarizedExperiment format.
Code
SE_object<-DGEList2SE(dge_count)# check that both expression matrices store the same contentstopifnot(identical(dge_count$counts, SummarizedExperiment::assay(SE_object)))saveRDS(SE_object, paste0("./data/processed/SummarizedExperiment_before_filtering_", today_date, ".rds"))rm(artificial_controls_metadata, barcode_counts_aggregated,barcode_metadata, counts, samples, pheno_data, barcode_files)
2.3 Luca’s pipeline
Code
old_dge_count<-readRDS(paste0("./data/processed/DGEList_before_filtering_", today_date, ".rds"))# remove Zero-controls, P42 and P43 experiencesdge_count<-old_dge_count[, !old_dge_count$samples$Compound%in%c("Control - time zero", "P42", "P43", "Control - T75")]
2.3.1 Preprocessing: noise removal and normalisation
Eliminate barcodes for which the combined counts of the 4 controls per barcode are below a given threshold, here 5.
By removing Time-Zero-controls, P42 and P43, as well as T75 flasks, we switch from 598 to 574 unique biological replicates.
Listing 2.4: Remove background noise, and aggregate all individual barcode counts.
After filtering for background noise, and removing samples with a total of 0 barcodes,4643 unique barcode IDs are kept for 574 unique biological replicates.
We collapsed from 574 to 142 by averaging barcode counts over biological replicates.
2.4 Differential barcode analyses
Follow differential protocol by Luca (fold change with respect to the mean, then binarise, and assign a 1 if \(FC>3\)).
Listing 2.6: Compute the averaged value for control replicates, then binarise for each compound. A 1 denoting a significant positive enrichment for a given barcode_id.
Code
dge_binarised_replicates<-differential_barcoded_luca(dge_normalised_replicates, group ="Batch_ID", reference ="Control", threshold_FC =3)saveRDS(dge_binarised_replicates, file =paste0("./results/differential_analyses/dge_discretised_replicate_",today_date, ".rds"))
We removed 60 control samples (all remaining samples being associated with an effective drug).
Aggregate over replicates, using a logical AND (in other words, a barcode must be differential in all replicates of a given compound, to be consider DR):
Code
# Use an all function as aggregator, but could have been a mean or sum as well.dge_binarised_samples<-collapseSamples(dge_binarised_replicates, group ="Unique_compound", showReps =FALSE, sample_colname ="Replicates_ID", method =all)null_barcodes<-which(rowSums(dge_binarised_samples$counts)==0)if(length(null_barcodes)>0L){message(paste("We removed a total of", length(null_barcodes),"over a total of", nrow(dge_binarised_samples), "barcode IDs. None of them being fully differential in at least one compound."))dge_binarised_samples<-dge_binarised_samples[-null_barcodes,]}saveRDS(dge_binarised_samples, file =paste0("./results/differential_analyses/dge_discretised_samples_",today_date, ".rds"))
We collapsed from 514 to 128 by averaging barcode counts over biological replicates while removing 3.747577 \(\%\) of barcodes, showing no consistent differential expression for at least one compound.