1  Data management: metadata correction and cleansing

1.1 Reproducibility

I list below the R packages required to reproduce the analyses.

Code
## data wrangling
library(dplyr)
library(tidyr)
library(stringr)
library(readr)
library(readxl)
library(purrr)

## reporting
library(flextable)

## auxiliary functions
source("R/utils.R")
today_date <- "2025-07-01"
# today_date <- format(Sys.Date(), "%Y-%m-%d")

1.2 Barcode counts cleaning

  • Original Google Drive repository.

  • Below, we convert each .TABULAR file to its recommended .tsv file extension, and rename unequivocally the first colname into barcode_id:

Code
barcode_files <- list.files("./data/barcode-counts/", 
                               pattern = "exp.*\\.tabular", 
                               full.names = TRUE)

## change format + barcode name
barcode_matrices <- lapply(barcode_files, function(old_filename) {
  expression_matrix <- readxl::read_tsv(old_filename) |> 
    rename(barcode_id = 1)
  
  new_filename <- old_filename |> basename() |> tools::file_path_sans_ext()
  
  readr::write_tsv(expression_matrix, 
                   file = file.path("data", "barcode-counts", paste0(new_filename, ".tsv")))
  unlink(old_filename, force = TRUE)
  
})
  • I rename the first colname of each expression file as barcode_id, storing unequivocal tag.

  • List of changes applied for easier mapping between replicates’ barcode profiles and samples’ metadata:

    • exp130921, switch from Osimer1_001g_exp130921_run101121_28 to Osimer1_001m_exp130921_run101121_28, from 001g to 001m (in-vivo context), and switch from Azacyt1_1,5u_exp130921_run290921_40 to Azacyt1_1.50u_exp130921_run290921_40, 1,5u to 1.50u1.
    • exp200921: switch from Bafilo1_1,2n_exp200921_run111021_45 to Bafilo1_1.20n_exp200921_run111021_45, 1,2n to 1.20n. In addition, the 4 control replicates of dose-response batch exp200921_dose_response are shared with standard experiment exp200921.csv.
    • exp151121: switch from Mitomy1_002x_exp151121_run071221_37 to Mitomy1_0.02u_exp151121_run071221_37, 002x to 0.02u.
    • exp070222: switch from Tunica1_0,5x_exp070222_run010322_05 to Tunica1_0.50u_exp070222_run010322_05, 0,5x to 0.50u.
    • exp070222t25: more comprehensive, so we keep the oldest one (no replicate cDNA missing). From cDNA1_000u_exp281022_run141222_01 to Contro1_000u_exp281022_run141222_01! What does cDNA refer to as?.
    • exp281022t25: one control and 3 Drug Compounds were sampled in T25 flasks. P42 and P43 are used for evaluating barcode drift, a technical artefact in relation with depth sequencing. Most of the replicates lack of significant effect due to insufficient dose.
  • exp281022gamme refers to the time course experiment.

We report in Table 1.1 a summary of the barcode counts (File location, batch_id, and dimensions).

Code
barcode_files <- list.files("./data/barcode-counts/",
                            pattern = "exp.*\\.tsv",
                            full.names = TRUE)
barcode_summaries <- purrr::map(barcode_files, function(filename) {
  barcode_counts <- readr::read_tsv(filename, show_col_types = FALSE)
  experience_summary <- tibble::tibble(Filename = basename(filename),
                                       `Num Barcode IDs` = nrow(barcode_counts),
                                       `Num Replicates` = ncol(barcode_counts) - 1)
  return(experience_summary)
}) |>
  purrr::list_rbind()

barcode_summaries <- barcode_summaries |>
  dplyr::inner_join(readxl::read_excel("data/Table of compounds_whole_2025-07-01.xlsx",
    sheet = "Batch Mapping"),
    by = "Filename") |>
  dplyr::relocate(Batch_ID, .after = Filename) |>
  dplyr::relocate(`Num Barcode IDs`, `Num Replicates`, .after = `Run date`)

flextable::flextable(barcode_summaries) |>
  bold(part = "header") |> 
  flextable::merge_v(j = c("Filename", "Batch_ID", "General Comments Per Batch"),
                      part = "body")

openxlsx::write.xlsx(x = barcode_summaries, asTable = TRUE,
                     file = paste0("./data/barcode_counts_summaries_", today_date, ".xlsx"))

Filename

Batch_ID

Date

Run date

Num Barcode IDs

Num Replicates

General Comments Per Batch

exp010821.tsv

exp010821

010821

180821

1,168,726

22

2 Control Time Zeros + 4 replicates of X13271, compared to old

exp040821.tsv

exp040821

040821

180821

1,168,726

22

2 Control Time Zeros + 4 replicates of X13271, compared to old (same remark as exp040821)

exp070222.tsv

exp070222

070222

010322

822,697

70

2 Additional Time Control Zeros

070222

060522

822,697

70

exp130921.tsv

exp130921

130921

290921

800,949

42

In new dataset, 2 additional Control Time Zeros and in vivo mixed with standard experiences.

130921

111021

800,949

42

exp130921_in_vivo

130921

101121

800,949

42

exp151121.tsv

exp151121

151121

071221

628,033

62

2 Additional Time Control Zeros

151121

180122

628,033

62

exp181021.tsv

exp181021

181021

101121

631,355

61

181021

251121

631,355

61

exp200921.tsv

exp200921

200921

111021

518,133

58

Merged with dose response.

exp200921_dose_response

200921

101121

518,133

58

Merged with 200921 standard +2 Additional Time Control Zeros. Does not contain controls, so merged with exp200921!!

exp220322.tsv

exp220322

220322

060522

1,038,088

70

2 Additional Time Control Zeros + 2 additional replicates of XAV939

220322

200522

1,038,088

70

exp271221.tsv

exp271221

271221

140122

439,396

82

2 Additional Time Control Zeros

271221

180122

439,396

82

exp281022gammeWithCTR.tsv

exp281022_time_course

281022

110723

509,473

56

Previously missing 3m_osim4_100n_exp180123_run110723_44 replicate is avalaible again in new dataset -> now 4 replicates // However, all 4 controls, such as 9d_Contro2_000u_exp281022_run110723_02 (from 01 to 04) are missing in new dataset, while present in older dataset // Finally, 4 new 'T-75' controls appear in new dataset, but `CTRL3-T75_000u_exp281022_run110723_24` and `CTRL1-T75_000u_exp281022_run110723_22` only contain null values -> we choose to discard them!! // in contrast with other batches, newest file provides signficant barcode values, that are not null!! (around 1%).

Accordingly, plug-in old controls at 9 days in the newest format.

071122

110723

509,473

56

Previously missing 3m_osim4_100n_exp180123_run110723_44 replicate is avalaible again in new dataset -> now 4 replicates // However, all 4 controls, such as 9d_Contro2_000u_exp281022_run110723_02 (from 01 to 04) are missing in new dataset, while present in older dataset // Finally, 4 new 'T-75' controls appear in new dataset, but `CTRL3-T75_000u_exp281022_run110723_24` and `CTRL1-T75_000u_exp281022_run110723_22` only contain null values -> we choose to discard them!! // in contrast with other batches, newest file provides signficant barcode values, that are not null!! (around 1%).







Accordingly, plug-in old controls at 9 days in the newest format.

181122

110723

509,473

56

021222

110723

509,473

56

161222

110723

509,473

56

180123

110723

509,473

56

exp281022t25.tsv

exp281022_t25

281022

141222

315,642

22

What's T25? 8 controls, for which reason? One mentioned as cDNA (first 4). Older dataset was more comprehensive, so we keep the oldest one.

exp300821.tsv

exp300821

300821

290921

413,335

31

Additional Sorafenib experience, with 005u concentration, 3X replicates only!!

Table 1.1: Main Batch features.

From Table 1.1:

  • Number of distinct time points thaws: 16, returned by {r} length(unique(barcode_summaries$Date))
  • Number of runs: 13, based on the total number of Barcode sequencing Runs.
  • Number of batches: 14, based on Batch IDs, and returned by {r} length(unique(barcode_summaries$Batch_ID))
  • Technical covariates: 25. Is computed as the Cartesian product of batches per run, since both Date of experiment and sequencing could impact the output, as returned by {r} nrow(barcode_summaries).

1.3 DrugSimDB to map each compound a Pathway and MoA

Code
# check after Samples_ID, to highlight duplicates

compound_metadata <- readr::read_csv(paste0("./data/replicates_barcode_metadata_comprehensive_",
                                     today_date, ".csv"), 
                              show_col_types = FALSE) |> 
  dplyr::filter(!Pathway %in% c("Control", "Drug combinations", "Novel compound", "Control - time zero")) |> 
  dplyr::distinct(Pathway, Compound)

utils::download.file(url = "https://raw.githubusercontent.com/iit-Demokritos/drug_id_mapping/refs/heads/main/drug-mappings.tsv", 
                     destfile = "./data/drugbank/drug-mappings.tsv")
drug_mappings <- readr::read_tsv("./data/drugbank/drug-mappings.tsv")

compound_metadata_mapped <- compound_metadata |> 
  fuzzyjoin::stringdist_left_join(drug_mappings |> 
                                    dplyr::select(drugbankId, name),
                                  by = c(Compound = "name"),
 # https://www.rdocumentation.org/packages/stringdist/versions/0.9.15/topics/stringdist-metrics, Damerau-Levenshtein distance                             
                                  method = "osa", 
                                  max_dist = 2,
                                  distance_col = "distance") |> 
  dplyr::rename(Generic_Name = name, Compound_Luca = Compound) |> 
  dplyr::relocate(Generic_Name, .after = Compound_Luca)

openxlsx::write.xlsx(compound_metadata_mapped,
                     file = "./data/drugbank/mapped_drugs_barcode.xlsx", 
                     asTable = TRUE)
  • Example: DrugBankID: DB12218 to its generic name: Capivasertib.

  • Retrieve raw drugbank XML, to map each generic drug name to its corresponding DrugBank-ID.

    1. Create a free account on Synapse, a website dedicated to open-source.
    2. Use drugbankR package to 1) convert the xml file format of DrugBank into a tidy dataframe, and 2) use SQLite syntax to explore it.
    3. (Alternative): Perform API requests directly on DrugBank, but it requires to log in as an academic, and might be slower than querying a local database.

1.3.1 DrugSimDB and similarity scores:

  1. Update drug-drug similarity measures based on the latest version of Drugbank, a tutorial mixing shell commands and R scripts.

  2. Main Similarity scores resulting from running DD steps of DrugBank. Similarity scores cover 6 items: Chemical structure,Targets, Gene Ontology, Cellular component, Molecular Function and Biological process of induced pathways.

Code
library(httr)
library(jsonlite)

# Get InChIKey from PubChem for Levofloxacin (or use webchem)
inchikey <- "GSDSWSVVBLHKDQ-UHFFFAOYSA-N"

# Use UniChem to get DrugBank ID
url <- paste0("https://www.ebi.ac.uk/unichem/rest/inchikey/", inchikey)
res <- fromJSON(url)


url_base <- "https://go.drugbank.com/drugs"
drug_identifier <- "DB00358"
drug_detailled_information <- httr2::request(url_base) |>
  httr2::req_url_path_append(drug_identifier) |>
  httr2::req_perform() |>
  httr2::resp_body_json()

url_base <- paste0("https://go.drugbank.com/drugs", drug_identifier)
res <- jsonlite::fromJSON(url_base)




renv::install("girke-lab/drugbankR")
drugbank_dataframe <- drugbankR::dbxml2df( xmlfile = "data/drugbank/drugbank.xml",
                                           version = "5.1.3") 

1.4 Phenotype metadata cleaning

Most of the R instructions reported in this section aim at rendering the Table of compounds file compliant with the tidy format, see Tip 1.1. These data-wrangling operations are split in 4 steps:

  1. In Listing 1.1, we homogenise Concentrations, Date and Duration to ISO standards, while dealing with erroneous missing values generated by fused Excel cells.
Listing 1.1: Coerce original Excel file reporting experimental design to tidy format.
Code
pheno_colnames <- c(Old_Pathway = "Pathways", Old_MoA = "...2", 
                    Compound = "Samples", Replicates = "Replicates",
                    Concentrations_Formatted = "Concentrations", Date = "Experiment date",
                    `Run date` = "Run date", Duration_ID = "Duration", 
                    Kept = "Kept", Comments = "Comments")

## prune irrelevant colnames ----

samples_metadata <- readxl::read_xlsx("data/Table of compounds_whole_2025-07-01.xlsx",
                                sheet = "Experimental Design",
                                col_types = c(rep("text", 3), "numeric", 
                                              rep("text", 4), "logical", "text")) |>
  dplyr::rename(dplyr::all_of(pheno_colnames)) |>
  tidyr::fill(Old_Pathway, Old_MoA, .direction = "down") |> 
  tidyr::fill(Compound, .direction = "down") |>
  dplyr::mutate(Old_Pathway = if_else(is.na(Old_Pathway), Old_MoA, Old_Pathway)) 

# integrate new pathway information from Luca ----
updated_MoA_pathway <- readxl::read_xlsx("data/Table of compounds_whole_2025-07-01.xlsx", 
                                                       sheet = "Pathway Mapping")
samples_metadata <- samples_metadata |> 
  dplyr::inner_join(updated_MoA_pathway,
                   by = join_by(Compound)) 

## format concentrations, creating both 'Concentrations_ID' and 'Concentrations' in Moles ----
samples_metadata <- samples_metadata |> 
  dplyr::mutate(Concentrations = if_else(Compound == "Osimertinb+sorafenib", "015 uM", Concentrations_Formatted)) |>
  tidyr::separate_wider_delim(Concentrations, delim = " ", names = c("Concentrations Value", "Unit"), cols_remove = FALSE) |>
  dplyr::mutate(Unit = case_when(
    Unit %in% c("uM", "µM", "µg/ml") ~ "u",
    Unit == "nM" ~ "n", 
    Unit %in% c("mg/kg", "mM") ~ "m", 
  )) |> 
  dplyr::mutate(Concentrations=parse_number(`Concentrations Value`, trim_ws = TRUE, 
                                            locale = locale(decimal_mark = ",", grouping_mark = "."))) |> 
   dplyr::mutate(Concentrations = case_when(
    Unit == "u" ~ Concentrations *10^-6,
    Unit == "n" ~ Concentrations *10^-9, 
    Unit == "m" ~ Concentrations *10^-3)) |> 
  dplyr::mutate(`Concentrations Value`= parse_number(`Concentrations Value`, trim_ws = TRUE, 
                                            locale = locale(decimal_mark = ",", grouping_mark = ".")) |> 
                  format_concentrations()) |> 
  tidyr::unite(col ="Concentrations_ID", `Concentrations Value`, Unit, sep = "")

## format Durations, creating both 'Duration_ID' and 'Duration' as an integer value in Days ----
samples_metadata <- samples_metadata |> 
  ## extract last 6 numbers, as the only ones used for identification of samples downstream
  dplyr::mutate(Duration = parse_number(Duration_ID, trim_ws = TRUE, 
                                        locale = locale(decimal_mark = ".")), 
                Duration = dplyr::if_else(grepl("m$",  Duration_ID),
                                          Duration*30, Duration)) 

rm(updated_MoA_pathway)
  1. In Listing 1.2, we extract from each replicate label patterns enabling its mapping to Table of compounds, and uses sheet:'Drugs Mapping' to map the prefix to its complete DrugBank ID.
Listing 1.2: Retrive batches of interest.
Code
batch_mapping <- readxl::read_xlsx("data/Table of compounds_whole_2025-07-01.xlsx",
                                       sheet = "Batch Mapping")
barcode_counts_paths <- batch_mapping |>
  dplyr::pull(Filename) |> unique()

## step 2) extract individual replicate IDs directly from sample experiences ----
ID_mapping <- purrr::map(barcode_counts_paths, function(Filename) {
  ## build path
  counts_path <- file.path("data","barcode-counts", 
                               paste0(Filename))
  
  ## extract replicate names (reading the header)
  replicates_ID <- strsplit(readLines(counts_path, n = 1), split = "\t")[[1]] |>
    str_subset("\\S") |> ## remove empty strings
    str_subset("barcode_id", negate = TRUE)

  ## ID: combination of letters, numbers and -, followed by '[1-4]_', starting the sample's name
  dataset_ID <- tibble::tibble(Filename = Filename, 
                               Date = stringr::str_extract(replicates_ID,
                                                           "(?<=_exp)[[:alnum:]]{6}"),
                               Replicates_ID = replicates_ID,
                               `Run date` = stringr::str_extract(replicates_ID,
                                                                 "(?<=_run)[[:alnum:]]{6}(?=_)"),
                               Concentrations_ID = stringr::str_extract(replicates_ID, 
                                                                        "(?<=[1-8]_)[[:digit:]]{3}[gmnux]{1}(?=_exp)"))  |> 
    ## account for second notation syntax for concentrations, namely for decimal values
    dplyr::mutate(Concentrations_ID = if_else(is.na(Concentrations_ID),
                                              stringr::str_extract(replicates_ID,
                                                                   "(?<=[1-8]_)[0-9]{1}\\.[0-9]{2}[gmnux]{1}(?=_exp)"),
                                              Concentrations_ID))
  
  ## Deal with time-course specific nomenclature
  if (grepl("281022(.*)gamme", Filename)) {
    dataset_ID <- dataset_ID |> 
      dplyr::mutate(Samples_ID = stringr::str_extract(replicates_ID,
                                                      "(?<=_)[[[:alnum:]]\\-]+(?=[1-8]_)"), 
                    Duration_ID=stringr::str_extract(replicates_ID,
                                                     "^[[[:alnum:]]\\.]{1,3}[dm](?=_)"))
    # deal with T-75 Controls
    dataset_ID <- dataset_ID |> 
      dplyr::mutate(Samples_ID = if_else(grepl("^CTRL[[:digit:]]{1}-T75", Replicates_ID), "CTRL", Samples_ID), 
                    Duration_ID = if_else(grepl("^CTRL[[:digit:]]{1}-T75", Replicates_ID), "9d", Duration_ID))
  }
  else {
    dataset_ID <- dataset_ID |> 
      dplyr::mutate(Samples_ID = stringr::str_extract(replicates_ID,
                                                      "^[[[:alnum:]]\\-]+(?=[1-8]{1}_)"), 
                    Duration_ID = if_else(grepl("^Temps0", Replicates_ID),"0d", "9d"))
  }
                                 
  return(dataset_ID)
}) |>
  purrr::list_rbind()

## deal with specific p42 and p43
ID_mapping <- ID_mapping |>
  dplyr::mutate(Samples_ID = if_else(grepl("^p42", Replicates_ID), "p42", Samples_ID),
                Samples_ID = if_else(grepl("^p43", Replicates_ID), "p43", Samples_ID))

## detect bad extraction of patterns samples
# ID_mapping_missing <- ID_mapping |>
#   filter(if_any(everything(), is.na))

## step 3) map short compound IDs with full compound names ----
mapping_compounds <- readxl::read_xlsx("data/Table of compounds_whole_2025-07-01.xlsx",
                                sheet = "Drugs Mapping")
ID_mapping <- ID_mapping |> 
  inner_join(mapping_compounds, by = "Samples_ID")

ID_mapping <- ID_mapping |> 
  dplyr::inner_join(batch_mapping, 
                    by = dplyr::join_by(Filename, Date, `Run date`)) |> 
  dplyr::select(-`General Comments Per Batch`)

rm(batch_mapping, mapping_compounds)
  1. In Table 1.2 is the final, curated metadata phenotype table, adhering to tidy principles Tip 1.1. This includes:
  1. Standardisation of Dates to ISO 1860 Date format.
Listing 1.3: Save the final metadata annotations for barcodes.
Code
## step 1) Use international date formats ----
replicates_metadata <- samples_metadata |> 
  dplyr::inner_join(ID_mapping,
                    by = dplyr::join_by(Compound, Date, `Run date`,
                                        Duration_ID, Concentrations_ID)) |> 
  ## convert to ISO 1860 Date format
  mutate(Date = lubridate::dmy(Date) |> format(),
         `Run date` = lubridate::dmy(`Run date`)) |> 
  # relocate colnames
  dplyr::select(Batch_ID, Filename, 
                Pathway, Old_Pathway, MoA, Old_MoA, 
                Compound, Samples_ID, Kept,
                Date, `Run date`, Duration, Duration_ID,
                Concentrations_Formatted, Concentrations_ID, Concentrations, 
                Replicates, Replicates_ID, Comments)

replicates_metadata_shortened <- replicates_metadata |> 
  dplyr::select(-Old_Pathway, -Old_MoA, -Samples_ID, 
                -Duration_ID, -Concentrations_ID, -Replicates)

# replicates_metadata_deduplicated <- replicates_metadata |>
#   dplyr::distinct(Replicates_ID, .keep_all = TRUE)
# all.equal(replicates_metadata, replicates_metadata_deduplicated)


## Step 2) save the tidy phenotype metatada table ----
readr::write_csv(replicates_metadata,
                 file = paste0("data/replicates_barcode_metadata_comprehensive_", 
                               today_date,".csv"))
readr::write_csv(replicates_metadata_shortened,
                 file = paste0("data/replicates_barcode_metadata_", today_date,".csv"))

flextable::flextable(head(replicates_metadata_shortened)) |> 
  autofit() |> 
  bold(part = "header")

Batch_ID

Filename

Pathway

MoA

Compound

Kept

Date

Run date

Duration

Concentrations_Formatted

Concentrations

Replicates_ID

Comments

exp010821

exp010821.tsv

Control - time zero

Control - time zero

Control - time zero

TRUE

2021-08-01

2021-08-18

0

000 uM

0

Temps01_000u_exp010821_run180821_01

exp010821

exp010821.tsv

Control - time zero

Control - time zero

Control - time zero

TRUE

2021-08-01

2021-08-18

0

000 uM

0

Temps02_000u_exp010821_run180821_02

exp040821

exp040821.tsv

Control - time zero

Control - time zero

Control - time zero

TRUE

2021-08-04

2021-08-18

0

000 uM

0

Temps01_000u_exp040821_run180821_23

exp040821

exp040821.tsv

Control - time zero

Control - time zero

Control - time zero

TRUE

2021-08-04

2021-08-18

0

000 uM

0

Temps02_000u_exp040821_run180821_24

exp130921

exp130921.tsv

Control - time zero

Control - time zero

Control - time zero

TRUE

2021-09-13

2021-09-29

0

000 uM

0

Temps01_000u_exp130921_run290921_48

exp130921_in_vivo

exp130921.tsv

Control - time zero

Control - time zero

Control - time zero

TRUE

2021-09-13

2021-11-10

0

000 uM

0

Temps02_000u_exp130921_run101121_19

In vivo, tumoral samples

Table 1.2
  1. Each variable has its own column.
  2. Each observation has its own row – Every row corresponds to one observation.
  3. Each value has its own cell – Each cell contains a single, unique value.

This format streamlines data wrangling, and generally speaking, data analysis and visualisation. In other words, prefer simpler CSV formats for the experimental design, and avoid cell fusion in Excel documents. Finally, for the formatted tabular representation in documentations, I use flextable.

1.4.1 Optional data quality checks

(Optional) Listing 1.3 (a) and Listing 1.4 (a) are data integrity checks to verify each replicate feature retrieved from its labelling is consistent with Table of compounds, sheet:'Experimental Design'.

  1. Listing 1.3 (a) identifies all replicates that could not be mapped back to their metadata description stored under Table of compounds, `sheet:‘Drugs Mapping’.
(a) Extract discarded replicates
Code
replicates_discarded <- dplyr::anti_join(ID_mapping, samples_metadata, 
                                         by = join_by(Date, `Run date`, Concentrations_ID,
                                                      Duration_ID, Compound))

pheno_data_unmapped <- dplyr::anti_join(samples_metadata,
                                        ID_mapping,
                                        by = join_by(Date, `Run date`, Concentrations_ID,
                                                     Duration_ID, Compound))

pheno_data_unmapped |> 
  arrange(Compound, Date, Concentrations_ID) |> 
  flextable() |> 
  bold(part = "header")
Table 1.3: Uses anti_join between metadata table and barcode counts to identify replicates that could not be mapped back, and reciprocally.
  1. Table 1.4 is used for checking discrepancies between the number of barcode replicates reported in Table of compounds and the number of replicates actually observed in barcode counts.
Code
replicate_inconsistencies <- samples_metadata |> 
  dplyr::inner_join(ID_mapping, 
                    by = dplyr::join_by(Compound, 
                                        Date, `Run date`, 
                                        Duration_ID, Concentrations_ID)) |>
  group_by(Batch_ID, Compound, Date, Replicates, Concentrations_ID,Duration_ID) |>
  summarise(n = n()) |>
  filter(n != Replicates)

flextable(replicate_inconsistencies) |>
  autofit() |>
  bold(part = "header")
(a) Secound round of data wrangling quality controls on counts versus phenotype data, focusing on divergent number of replicates.
Table 1.4: Secound round of data wrangling quality controls on counts versus phenotype data, focusing on divergent number of replicates.

  1. In-vivo = cells injected in living mice, control versus osermintinimb.↩︎