Mohammad Darbalaei
Daniel Hoffmann
Bioinformatics and Computational Biophysics, Faculty of Biology, University of Duisburg-Essen
Tumor heterogeneity is a major challenge in cancer research and precision oncology. In many cancers, including gastrointestinal stromal tumors (GIST), tumors are not uniform but consist of multiple genetically distinct subclones. These subclones can respond differently to treatment, and resistant populations often emerge during therapy due to secondary mutations in the KIT receptor or downstream signaling pathways.
Understanding how different genotypes respond to treatment is therefore essential. However, studying each clone individually is time-consuming and does not capture how clones behave when they grow together and compete within the same environment.
To overcome these limitations, DNA barcoding enables multiplexed experiments in which many clones are pooled and analyzed simultaneously. In this approach, each clone is labeled with a unique DNA barcode. After treatment, high-throughput sequencing is used to count these barcodes, allowing researchers to track how the composition of the population changes over time.
This makes it possible to study many genotypes in parallel within a single experiment, greatly increasing scalability and efficiency.
A key limitation of barcode sequencing data is that it provides only relative abundances. Because these data are compositional, all clone fractions within a sample are constrained to sum to one.
Consequently, relative changes cannot be directly interpreted as absolute changes in clone size. An increase in one clone’s fraction necessarily implies a decrease in others, even if all clones are expanding. Conversely, a clone may appear to decline despite continued growth, simply because competing clones expand more rapidly.
Relative changes alone are therefore insufficient to determine whether a treatment truly suppresses or enhances clonal growth.
To resolve this limitation, measurements of total population size are required.
The appropriate measurement depends on the experimental setting:
These measurements capture how the entire population expands or contracts under treatment and enable interpretation of clonal dynamics on an absolute scale.
The barmixR framework combines both data sources:
By jointly modeling these data in a Bayesian hierarchical framework, barmixR estimates clone-specific treatment effects on an absolute scale while accounting for uncertainty in both measurements.
The central output of the model is quantitative treatment resistance (QTR), which describes how each clone responds to treatment compared to a control condition:
Although barmixR was developed for multiplexed tumor experiments, the underlying approach is broadly applicable. Any experiment that combines:
can benefit from this framework.
Examples include:
To demonstrate the barmixR workflow, we generate a synthetic dataset that captures key features of multiplexed barcoded experiments.
The simulation includes 4 clones, 2 conditions (one control and one treatments), and 3 replicates per condition. For each sample, we simulate:
Counts are generated from a Dirichlet–multinomial distribution, reflecting overdispersion in sequencing data.
Clones are grouped into subsets with different treatment-specific patterns (enriched, neutral, or depleted), introducing structured heterogeneity across conditions.
Total population sizes are simulated using a log-normal distribution, capturing the variability and skewness typical of tumor growth. Treatment conditions differ in their overall growth behavior relative to control.
The simulation reflects the core principle of barmixR: combining relative composition and total population size to define the fractional size for each cell line \(l\) and condition \(k\):
\[ \nu_l^{(k)} = p_l^{(k)} \cdot V^{(k)}, \quad l = 1,\dots,L,\; k = 1,\dots,K \]
where \(l\) indexes cell lines and \(k\) indexes treatment conditions.
library(MGLM)
library(tidyverse)
library(barmixR)
set.seed(1001)
# --- Reduced size for vignette ---
k <- 4 # number of clones (cell lines)
n_reps <- 3 # fewer replicates
n_reads <- 2000 # fewer sequencing reads
# Only control + one treatment
conditions <- c("DMSO", "TreatA")
# Helper function
generate_profile <- function(pattern, k){
breaks <- floor(seq(0, k, length.out = length(pattern) + 1))
prof <- numeric(k)
for(i in seq_along(pattern)){
idx <- (breaks[i] + 1):breaks[i + 1]
prof[idx] <- pattern[i]
}
prof + runif(k, 0, 0.05)
}
# Define profiles (simplified)
alpha_profiles <- list(
DMSO = rep(1/k, k),
TreatA = generate_profile(c(2, 0.5), k)
)
# Simulate counts
all_data <- list()
for(cond in conditions){
alpha <- alpha_profiles[[cond]]
alpha_scaled <- alpha / sum(alpha) * 20 # smaller concentration
mat <- rdirmn(n_reps, n_reads, alpha_scaled)
rownames(mat) <- paste0(cond, "_rep", 1:n_reps)
colnames(mat) <- paste0("clone", 1:k)
all_data[[cond]] <- mat
}
data_count <- do.call(rbind, all_data)The object data_count contains the simulated barcode sequencing counts for all samples. Each row corresponds to a replicate sample under a specific treatment condition, and each column corresponds to one of the barcoded clones.
The entries in this matrix represent the number of sequencing reads assigned to each barcode in a given sample. Because sequencing measures relative composition, these counts primarily reflect the relative abundance of clones within the pooled population.
## clone1 clone2
## DMSO_rep1 475 490
## DMSO_rep2 216 300
In this preview:
DMSO_rep1, TreatA_rep2)clone1, clone2, …)These counts form the input to the barcode component of the barmixR model, which estimates clone composition under each treatment condition.
To complement the compositional barcode data, we simulate total population sizes. In an in vivo setting, these correspond to tumor volumes; in an in vitro setting, analogous measurements could be cellular confluency.
get_lognorm_params <- function(mean, sd){
sigma2 <- log(1 + (sd / mean)^2)
mu <- log(mean) - sigma2 / 2
sigma <- sqrt(sigma2)
list(meanlog = mu, sdlog = sigma)
}
params_t0 <- get_lognorm_params(100, 40)
params_t14 <- get_lognorm_params(300, 150)
sim_vols <- function(params, seed){
set.seed(seed)
rlnorm(n_reps, params$meanlog, params$sdlog)
}
DMSO_t0 <- sim_vols(params_t0, 1)
DMSO_t14 <- sim_vols(params_t14, 101)
TreatA_t14 <- sim_vols(get_lognorm_params(200, 120), 102)
# Combine
V_t0 <- round(c(DMSO_t0, DMSO_t0), 1)
V_t14 <- round(c(DMSO_t14, TreatA_t14), 1)To run the barmixR model, all required information is organized into a structured list. This list includes barcode counts, treatment labels, population-size measurements, and clone identifiers.
condition_count <- factor(
sub("_rep[0-9]+$","",rownames(data_count)),
levels=conditions
)
cell_line <- factor(paste0("clone",1:k))
data <- list(
data_count=data_count,
condition_count=condition_count,
V=V_t14,
VT0=V_t0,
condition_v=condition_count,
cell_line=cell_line
)
time_d <- 14## $data_count
## clone1 clone2 clone3 clone4
## DMSO_rep1 475 490 438 597
## DMSO_rep2 216 300 364 1120
## DMSO_rep3 538 747 380 335
## TreatA_rep1 430 848 649 73
## TreatA_rep2 1087 481 154 278
## TreatA_rep3 639 1069 147 145
##
## $condition_count
## [1] DMSO DMSO DMSO TreatA TreatA TreatA
## Levels: DMSO TreatA
##
## $V
## [1] 230.0 348.3 195.1 189.6 265.0 81.0
##
## $VT0
## [1] 72.9 99.7 67.3 72.9 99.7 67.3
##
## $condition_v
## [1] DMSO DMSO DMSO TreatA TreatA TreatA
## Levels: DMSO TreatA
##
## $cell_line
## [1] clone1 clone2 clone3 clone4
## Levels: clone1 clone2 clone3 clone4
For demonstration purposes, we use a reduced number of posterior sampling iterations to shorten computation time.
In practical applications, longer sampling runs are recommended to ensure stable and well-mixed posterior estimates.
# Reduced number of posterior sampling iterations
# to keep the vignette computationally lightweight.
# For real analyses, increase iterations and chains
# to ensure proper convergence and reliable inference.
fit <- barmixRQTR(
data = data,
time_d = time_d,
control = list(
chains = 1,
iter_count = 100,
iter_V = 100,
cores = 1
),
in_vivo = TRUE
)##
## SAMPLING FOR MODEL 'barcode' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 4.9e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.49 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 7
## Chain 1: adapt_window = 38
## Chain 1: term_buffer = 5
## Chain 1:
## Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
## Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
## Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
## Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
## Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
## Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
## Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
## Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
## Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
## Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
## Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
## Chain 1: Iteration: 100 / 100 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 2.577 seconds (Warm-up)
## Chain 1: 2.485 seconds (Sampling)
## Chain 1: 5.062 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'tumor_volume' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 1.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 7
## Chain 1: adapt_window = 38
## Chain 1: term_buffer = 5
## Chain 1:
## Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
## Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
## Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
## Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
## Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
## Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
## Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
## Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
## Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
## Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
## Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
## Chain 1: Iteration: 100 / 100 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.007 seconds (Warm-up)
## Chain 1: 0.005 seconds (Sampling)
## Chain 1: 0.012 seconds (Total)
## Chain 1:
The model is estimated using Hamiltonian Monte Carlo (HMC) via Stan.
Posterior sampling produces probability distributions for all parameters, including clone compositions, population sizes, and treatment resistance.
We next estimate ratios of relative barcode abundance between treatment groups and the control condition.
sampled_fraction <- ppc_barcode$sampled_fraction
fraction_ratio_results <- fractionRatio(model, sampled_fraction)
fraction_ratio_results$plot_ratio_fractionpopulation_ratio_results <- populationRatio(model, sampled_population)
population_ratio_results$plot_ratio_populationresistance <- QTRresistance(
model,
fraction_ratio_results$li_sam_ratio_relative,
population_ratio_results$li_sam_ratio_V
)
print(resistance$treatment_resistance) The Quantitative Treatment Resistance (QTR) is inferred by integrating information from:
within a joint Bayesian framework. This approach propagates uncertainty from both data sources to obtain posterior distributions of treatment response for each clone–treatment pair.
Each QTR value represents the log-scale change in fractional clone size under treatment relative to control.
When visualized (e.g., as density plots or violin plots), each distribution reflects the posterior uncertainty of treatment resistance for a given genotype–treatment combination. A vertical reference line at zero separates the sensitive and resistant regimes.
The heatmap summarizes clone-specific quantitative treatment resistance (QTR) across all treatments and cell lines, enabling direct comparison of drug effects in a multiplexed setting.
In analogy to the bubble plot representation used in the main analysis, QTR values can be interpreted jointly in terms of direction and magnitude:
This visualization provides a global overview of treatment performance across clones and highlights genotype-specific differences in drug response, facilitating intuitive comparison across all genotype–drug combinations.
One practical goal of multiplexed barcoded experiments is to enable treatment prioritization across genetically distinct tumor subclones. Pooled barcoding approaches were developed to increase throughput, support internally controlled comparisons, and make large-scale genotype-specific drug testing more feasible in preclinical research. In this setting, ranking treatment effects across genotypes can help highlight mutation-specific vulnerabilities while reducing the number of separate follow-up experiments needed for compound selection.
The QTRDecision() function summarizes posterior QTR distributions for each cell line and treatment using median, interquartile range, and whiskers derived from QTRresistance(). This provides a compact ordering of candidate treatments within each genotype and supports comparison of drug performance across heterogeneous cell populations.
cell_order <- paste0("clone", 1:4)
decision <- QTRDecision(
model = model,
summary_table = resistance$summary_table,
cell_order = cell_order
# legend_text = "DMSO = control condition"
)
print(decision$rank_plot) In the ranking panel, treatments are ordered within each cell line by their posterior median QTR. Values further to the left indicate lower QTR and therefore greater treatment sensitivity, whereas values further to the right indicate higher QTR and therefore greater resistance. The box shows the interquartile range, the central line marks the median, and the whiskers summarize the lower and upper bounds of the displayed posterior interval.
This type of ranking can support preclinical treatment prioritization by helping to:
In translational and pharmaceutical settings, this kind of summary may be useful as an early prioritization layer for selecting compounds, dose ranges, or combinations for additional validation.
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] barmixR_0.99.1 lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0
## [5] dplyr_1.2.1 purrr_1.2.2 readr_2.2.0 tidyr_1.3.2
## [9] tibble_3.3.1 ggplot2_4.0.3 tidyverse_2.0.0 MGLM_0.2.1
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.4 stringi_1.8.7
## [4] hms_1.1.4 digest_0.6.39 magrittr_2.0.5
## [7] evaluate_1.0.5 grid_4.6.0 timechange_0.4.0
## [10] RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0
## [13] pkgbuild_1.4.8 gridExtra_2.3 QuickJSR_1.10.0
## [16] scales_1.4.0 codetools_0.2-20 jquerylib_0.1.4
## [19] cli_3.6.6 rlang_1.2.0 withr_3.0.2
## [22] cachem_1.1.0 yaml_2.3.12 otel_0.2.0
## [25] StanHeaders_2.32.10 inline_0.3.21 tools_4.6.0
## [28] rstan_2.32.7 parallel_4.6.0 tzdb_0.5.0
## [31] rstantools_2.6.0 curl_7.1.0 vctrs_0.7.3
## [34] R6_2.6.1 matrixStats_1.5.0 stats4_4.6.0
## [37] lifecycle_1.0.5 V8_8.2.0 pkgconfig_2.0.3
## [40] RcppParallel_5.1.11-2 pillar_1.11.1 bslib_0.11.0
## [43] gtable_0.3.6 loo_2.9.0 glue_1.8.1
## [46] Rcpp_1.1.1-1.1 xfun_0.57 tidyselect_1.2.1
## [49] knitr_1.51 dichromat_2.0-0.1 farver_2.1.2
## [52] htmltools_0.5.9 patchwork_1.3.2 labeling_0.4.3
## [55] rmarkdown_2.31 compiler_4.6.0 S7_0.2.2