Highly multiplex PCR platforms enable drastically greater throughput compared to traditional PCR methods, making them critical for pathogen detection and gene expression experiments in public health and clinical research environments. One such platform is ThermoFisher Scientific’s OpenArray Technology, which our lab uses to screen patient swabs for respiratory viral infections. With 3000+ qPCR reactions taking place on each OpenArray plate, the sheer volume of data produced requires an informatic solution for downstream analyses and reporting.
ThermoFisher provides the proprietary QuantStudio 12K Flex Software and the Relative Quantification Application for data analysis. However, no all-in-one open-source software exists to derive reportable results from raw fluorescence values measured on OpenArray.
To bridge the gap, OAtools streamlines the analysis process including data import, plotting and quality control, analysis, and communication through an R Markdown document. Our goal in sharing OAtools is to support open-source and shareable analyses of OpenArray gene expression data, especially in clinical research environments.
To install the development version of OAtools directly from the uwvirology-ngs public GitHub, (https://github.com/uwvirology-ngs/OAtools) run the commands below in your R console:
Our lab plans to share this project as a Bioconductor package. Once available, this package may be installed from Bioconductor by running the following commands in the R console:
If following along with the vignette by running code blocks in the R console, attach the OAtools and SummarizedExperiment libraries to the environment. The SummarizedExperiment package should already be installed as a dependency.
If it is not installed, install dplyr with the following:
OAtools supports two core methods for analyzing OpenArray qPCR data:
1.) The first method is to simply carry over the relative threshold analysis native to the QuantStudio 12K Flex Software.
2.) The second method is to fit model curves to the fluorescence vs. cycle data for each well individually, then determine PCR results which depend on features of the curve.
To start, we specify the file path to some example OpenArray qPCR run data, which is included with the package. This data is a sample from a gene expression experiment our lab, the University of Washington Virology Lab, ran on human nasal swabs to screen for respiratory pathogens.
For detailed documentation on the example data used in this vignette, run ?oa_gene_expression_1 in the R console with OAtools attached.
With the file path specified, we use the excelToSE() function to read the Excel file containing the run data and load it into a SummarizedExperiment object. SummarizedExperiment is an S4 Class from Bioconductor used to store matrix-like assay data and associated metadata. This function takes optional parameters for the number of header rows, which store run metadata, and the number of rows to skip before reading assay data, though the defaults will work in this case.
With the data loaded into a SummarizedExperiment container, we can review its structure. The assay matrices store observed fluorescence values measured by the QuantStudio 12K Flex thermocycler by cycle number, where each column stores PCR reactions from through-holes on the OpenArray plate separately.
The excelToSE() function generates two assay matrices by default, which are fluo_normalized and fluo_reporter. The former refers to baseline-corrected, normalized fluorescence imported from the ‘Amplification Data’ tab, while the latter refers to the spectral contribution of the reporter dye imported from the ‘Multicomponent Data’ tab. We may access the assay matrix corresponding to the multicomponent data, for example, by invoking assays(se)$fluo_reporter in the R console, then coercing into a data frame with as.data.frame() as shown below.
Here we display a slice of the multicomponent fluorescence data from cycles 11 to 20, where the exponential phase of amplification tends to occur. The rows of each assay matrix denote the cycle number and the columns denote the individual PCR reactions on the OpenArray plate.
# retrieve the assay matrix and display a subset as a data frame
as.data.frame(assays(se)$fluo_reporter) |>
dplyr::select(well_2321:well_2386) |>
dplyr::slice(11:20)| well_2321 | well_2322 | well_2329 | well_2330 | well_2343 | well_2344 | well_2345 | well_2346 | well_2385 | well_2386 | |
|---|---|---|---|---|---|---|---|---|---|---|
| cycle_11 | 6405.788 | 5483.835 | 7060.261 | 6067.686 | 3605.021 | 3631.051 | 2256.322 | 1690.796 | 6164.024 | 5435.064 |
| cycle_12 | 6405.382 | 5482.771 | 7058.596 | 6073.878 | 3594.832 | 3638.393 | 2284.099 | 1705.556 | 6165.793 | 5464.090 |
| cycle_13 | 6414.322 | 5482.930 | 7037.791 | 6073.881 | 3602.094 | 3640.851 | 2314.417 | 1735.094 | 6211.680 | 5526.001 |
| cycle_14 | 6419.167 | 5476.461 | 7020.177 | 6056.675 | 3590.731 | 3628.519 | 2382.905 | 1796.593 | 6293.289 | 5586.283 |
| cycle_15 | 6425.805 | 5465.984 | 7023.062 | 6048.876 | 3582.180 | 3636.152 | 2474.373 | 1891.953 | 6427.477 | 5707.974 |
| cycle_16 | 6430.859 | 5459.897 | 7023.853 | 6065.785 | 3565.366 | 3641.555 | 2612.977 | 2027.775 | 6648.198 | 5944.312 |
| cycle_17 | 6415.730 | 5457.025 | 7041.731 | 6057.306 | 3561.831 | 3638.955 | 2782.659 | 2207.963 | 6934.779 | 6269.943 |
| cycle_18 | 6407.045 | 5462.144 | 7030.021 | 6053.231 | 3573.811 | 3633.493 | 2994.564 | 2414.271 | 7318.756 | 6666.104 |
| cycle_19 | 6409.033 | 5473.986 | 7027.955 | 6045.461 | 3588.689 | 3634.740 | 3215.832 | 2629.131 | 7770.426 | 7119.726 |
| cycle_20 | 6390.317 | 5471.049 | 7036.966 | 6051.801 | 3595.764 | 3622.620 | 3439.598 | 2813.328 | 8252.472 | 7611.985 |
Well number, sample, gene, quality control metrics, and various metadata associated with each PCR reaction are stored as a DataFrame accessible with colData(se). The Crt (relative threshold value) and amplification status determined by QuantStudio 12K Flex Software are saved here as well. The relative threshold value is analogous to a Ct value reported in a typical qPCR assay.
Here we display a subset of the column annotation DataFrame associated with the first 10 wells.
# retrieve the coldata and render a subset as a data frame
as.data.frame(colData(se)) |>
dplyr::select(
well, sample_name:target_name, reporter, crt, amp_score:amp_status
) |>
dplyr::slice(1:10)| well | sample_name | target_name | reporter | crt | amp_score | cq_conf | amp_status | |
|---|---|---|---|---|---|---|---|---|
| well_2321 | 2321 | Sample-101 | RV_1of2_Vi99990016_po | FAM | NA | 0.000000 | 0.0000000 | No Amp |
| well_2322 | 2322 | Sample-101 | RV_1of2_Vi99990016_po | FAM | NA | 0.000000 | 0.0000000 | No Amp |
| well_2329 | 2329 | Sample-101 | RV_2of2_Vi99990017_po | FAM | NA | 0.000000 | 0.0000000 | No Amp |
| well_2330 | 2330 | Sample-101 | RV_2of2_Vi99990017_po | FAM | NA | 0.000000 | 0.0000000 | No Amp |
| well_2343 | 2343 | Sample-101 | S. pneumoniae_Ba06439619_s1 | FAM | NA | 0.000000 | 0.0000000 | No Amp |
| well_2344 | 2344 | Sample-101 | S. pneumoniae_Ba06439619_s1 | FAM | NA | 0.000000 | 0.0000000 | No Amp |
| well_2345 | 2345 | Sample-101 | RNAse_P_Pa04930436_g1 | FAM | 14.98677 | 1.458653 | 0.8907495 | Amp |
| well_2346 | 2346 | Sample-101 | RNAse_P_Pa04930436_g1 | FAM | 14.94592 | 1.447588 | 0.8192336 | Amp |
| well_2385 | 2385 | Sample-102 | RV_1of2_Vi99990016_po | FAM | 15.99536 | 1.629182 | 0.9614623 | Amp |
| well_2386 | 2386 | Sample-102 | RV_1of2_Vi99990016_po | FAM | 15.90011 | 1.633002 | 0.9618700 | Amp |
By contrast, the row annotation DataFrame stores only the cycle numbers, which is less interesting by itself.
# retrieve the rowdata and render a subset as a data frame
as.data.frame(rowData(se)) |>
dplyr::slice(1:10)| cycle | |
|---|---|
| cycle_1 | 1 |
| cycle_2 | 2 |
| cycle_3 | 3 |
| cycle_4 | 4 |
| cycle_5 | 5 |
| cycle_6 | 6 |
| cycle_7 | 7 |
| cycle_8 | 8 |
| cycle_9 | 9 |
| cycle_10 | 10 |
Finally, the file path of the imported data and run-level metadata are stored in a list accessible with metadata(se). Here we display the run information.
| Field | Data |
|---|---|
| Barcode | ABC123 |
| Block Type | OpenArray Block |
| Chemistry | TAQMAN |
| Date Created | 04-29-2025 09:39:53 AM PDT |
| Date Modified | 01-06-2026 16:43:39 PM PST |
| Experiment File Name | C:123.eds |
| Experiment Name | ABC123 |
| Experiment Run Start Time | 04-29-2025 09:42:30 AM PDT |
| Experiment Run Stop Time | 04-29-2025 11:50:35 AM PDT |
| Experiment Type | Gene Expression |
| Instrument Name | 987654321 |
| Instrument Serial Number | 987654321 |
| Instrument Type | QuantStudio 12K Flex |
| Passive Reference | NA |
| Quantification Cycle Method | Crt |
| Signal Smoothing On | true |
| Stage/ Cycle where Analysis is performed | Stage 2, Step 3 |
Model data generated in downstream curve-fitting steps is saved to the experiment metadata as well.
If we intend to use the relative threshold analysis native to the instrument software, we can create a run result report straight away. We can call the generateReport() function directly on the Summarized Experiment, which then dynamically generates a .html document summarizing the results.
Note that the package kableExtra is required to generate the PCR report. If it is not installed, install with:
# install kableExtra from CRAN
if (!require("kableExtra", quietly = TRUE)) {
install.packages("kableExtra")
}The following code block generates the PCR report assuming that data has been loaded in using excelToSE(). Note that this code chunk is not wired to execute when constructing the vignette as generateReport() calls code to save the report directly to the user’s machine, though it defaults to a temporary file.
At present, this report pulls from the analysis provided by the QuantStudio 12K Flex software, though an option to build the report from data generated by the curve-fitting method is planned for future development.
The second method of data analysis, as mentioned in the overview, is to fit model curves to each PCR reaction curve individually, then use computed features to differentiate positive and negative PCR results. In particular, the computeModels() package function calls an algorithm from SciPy and attempts to fit 5-parameter logistic regressions to the fluorescence curves. The 5PL model is implemented for its favorable ability to handle asymmetries common to amplification curves.
This approach has the advantages of transparency and reproducibility, circumventing the need to rely on propriety algorithms to interpret PCR results.
Here we start the data analysis again from the point where we have exported the gene expression experiment from QuantStudio 12K Flex software. As in the previous example, we load the experiment into a SummarizedExperiment object.
Next, we run the optimizer and attempt to fit a 5PL model curve to each PCR reaction on the OpenArray plate. As mentioned above, OAtools exports the computeModels() function, which iterates over the experiment and does this programmatically. computeModels() then saves the parameters of each model curve, along with features computed from them, as a list in the experiment metadata. If the fluo_reporter assay matrix is used, for example, the models would be accessible with metadata(se)$fluo_reporter_models.
Under the hood, computeModels() invokes a thin R wrapper, runFitCurve(), around the internal fit_curve() function implemented in python3. computeModels() handles the logic for iterating over the OpenArray plate, while fit_curve() interfaces with SciPy and actually runs the optimization, returning a model curve when provided PCR data (observed fluorescence vs. cycle number) for a single reaction.
A linear_threshold parameter is optionally provided when calling computeModels(). PCR curves with overall change in fluorescence below this threshold are marked as negative and fit to linear models instead. The rationale is that such PCR curves are trivially negative for the amplicon of interest, but add a great deal of compute time to the optimization process and are better filtered out.
# optimize model curves to each PCR reaction
se <- computeModels(
se = se,
assay_name = "fluo_reporter"
)
#> Using Python: /home/biocbuild/.pyenv/versions/3.12.12/bin/python3.12
#> Creating virtual environment '/var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env' ...
#> + /home/biocbuild/.pyenv/versions/3.12.12/bin/python3.12 -m venv /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env
#> Done!
#> Installing packages: pip, wheel, setuptools
#> + /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env/bin/python -m pip install --upgrade pip wheel setuptools
#> Installing packages: 'numpy==2.4.1', 'pandas==2.3.3', 'scipy==1.17.0'
#> + /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env/bin/python -m pip install --upgrade --no-user 'numpy==2.4.1' 'pandas==2.3.3' 'scipy==1.17.0'
#> Virtual environment '/var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env' successfully created.
# display an example model, fit to the PCR reaction in well 2345
metadata(se)$fluo_reporter_models$well_2345
#> $regression_type
#> [1] "5pl"
#>
#> $parameters
#> $parameters$A
#> [1] 2235.61
#>
#> $parameters$D
#> [1] 6056.299
#>
#> $parameters$B
#> [1] 0.1653346
#>
#> $parameters$C
#> [1] 7.534664
#>
#> $parameters$S
#> [1] 10
#>
#>
#> $r_squared
#> [1] 0.9995366
#>
#> $x_obs
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
#>
#> $y_obs
#> [1] 2264.415 2264.059 2257.542 2249.905 2250.833 2244.891 2244.586 2260.520
#> [9] 2264.602 2254.724 2256.322 2284.099 2314.417 2382.905 2474.373 2612.977
#> [17] 2782.659 2994.564 3215.832 3439.598 3664.946 3877.989 4076.521 4260.230
#> [25] 4445.077 4614.959 4791.316 4936.122 5061.273 5197.476 5323.698 5437.324
#> [33] 5541.725 5625.767 5684.772 5746.419 5803.184 5839.543 5865.371 5872.880
#>
#> $y_pred
#> [1] 2235.615 2235.624 2235.655 2235.742 2235.980 2236.579 2237.985 2241.052
#> [9] 2247.255 2258.911 2279.277 2312.449 2362.977 2435.220 2532.592 2656.869
#> [17] 2807.778 2982.940 3178.175 3388.056 3606.569 3827.741 4046.141 4257.204
#> [25] 4457.395 4644.221 4816.153 4972.485 5113.171 5238.659 5349.745 5447.448
#> [33] 5532.905 5607.300 5671.805 5727.544 5775.568 5816.843 5852.242 5882.549
#>
#> $delta_y
#> [1] 3597.259
#>
#> $x_midpoint
#> [1] 23.46745
#>
#> $y_midpoint
#> [1] 4145.955
#>
#> $slope_midpoint
#> [1] 211.5127Finally, we can leverage the model data we’ve generated to derive PCR results. OAtools exports the determinePCRResults() function, which we can invoke on our SummarizedExperiment to generate such results. determinePCRResults() requires as input an Excel key. This key associates each gene target with threshold values for model features used to separate positive and negative PCR results. In particular, values are specified for the cycle threshold, midpoint slope, and overall change-in-fluorescence.
Different thresholds may be specified for distinct target genes to account for differences in primer efficiency across assays on the same OpenArray plate.
For documentation and details on the key, run ?OAtools::target_threshold_key in the R console.
determinePCRResults() computes PASS/FAIL values for the cycle threshold, midpoint slope, and change-in-fluorescence, marking PCR reactions which pass on all accounts as positive and all others as negative. These results are saved to the column annotation DataFrame.
# save filepath to target-threshold-key
key_path = system.file(
"extdata",
"target_threshold_key.xlsx",
package = "OAtools"
)
# assign a PCR result according to the key
se <- determinePCRResults(
se = se,
key_path = key_path
)
# render a snapshot including the logic for determining the result
as.data.frame(SummarizedExperiment::colData(se)) |>
dplyr::select(
well, crt, midpoint_slope, delta_fluo,
crt_threshold, slope_threshold, delta_threshold, result
) |>
dplyr::slice_head(n = 10)| well | crt | midpoint_slope | delta_fluo | crt_threshold | slope_threshold | delta_threshold | result |
|---|---|---|---|---|---|---|---|
| 2321 | NA | 0.3555216 | 12.52458 | 35 | 75 | 750 | negative |
| 2322 | NA | 1.2217160 | 49.51595 | 35 | 75 | 750 | negative |
| 2329 | NA | -1.7810606 | -62.98893 | 35 | 75 | 750 | negative |
| 2330 | NA | -0.9223582 | -33.51660 | 35 | 75 | 750 | negative |
| 2343 | NA | -2.5574979 | -93.83805 | NA | 100 | 1000 | negative |
| 2344 | NA | -1.4442550 | -47.09253 | NA | 100 | 1000 | negative |
| 2345 | 14.98677 | 211.5126506 | 3597.25928 | NA | 75 | 1000 | positive |
| 2346 | 14.94592 | 207.3108679 | 3671.44303 | NA | 75 | 1000 | positive |
| 2385 | 15.99536 | 482.0642170 | 7382.08024 | 35 | 75 | 750 | positive |
| 2386 | 15.90011 | 502.0597006 | 7440.44922 | 35 | 75 | 750 | positive |
OAtools supports interoperability with the NormqPCR package from Bioconductor, which provides functions to normalize RT-qPCR data from gene expression platforms like OpenArray. NormqPCR reads from the qPCRBatch class object, which requires the ReadqPCR package. We can set up our environment as below:
# Install ReadqPCR from Bioconductor
if (!require("ReadqPCR", quietly = TRUE)) {
BiocManager::install("ReadqPCR")
}
# Install NormqPCR from Bioconductor
if (!require("NormqPCR", quietly = TRUE)) {
BiocManager::install("NormqPCR")
}Then, to convert the SummarizedExperiment container into a NormqPCR-friendly instance of the qPCRBatch class, simply invoke the seToQPCRBatch() function on the SummarizedExperiment container. With our qPCRBatch object, we can, for example, normalize the cycle thresholds against a housekeeping gene.
# convert SummarizedExperiment container to qPCRBatch
qpcr <- seToQPCRBatch(se)
# choose housekeeping gene (in this case, the RNAse P control)
housekeeping_gene = "RNAse_P_Pa04930436_g1"
# run delta-Cq calculation
norm <- NormqPCR::deltaCq(
qPCRBatch = qpcr,
hkgs = housekeeping_gene
)
# display expression matrix
as.data.frame(Biobase::exprs(norm)) |>
dplyr::select(`Pos_Control_A`:`Sample-106`) |>
knitr::kable(digits = 2)| Pos_Control_A | Sample-101 | Sample-102 | Sample-103 | Sample-104 | Sample-105 | Sample-106 | |
|---|---|---|---|---|---|---|---|
| RNAse_P_Pa04930436_g1 | 0.00 | 0 | 0.00 | 0 | 0.00 | 0.00 | 0.00 |
| RV_1of2_Vi99990016_po | 4.12 | NaN | 0.53 | NaN | 14.39 | NaN | -1.56 |
| RV_2of2_Vi99990017_po | 2.27 | NaN | 0.66 | NaN | 13.42 | NaN | -2.46 |
| S. pneumoniae_Ba06439619_s1 | 10.19 | NaN | NaN | NaN | NaN | 0.74 | NaN |
Please refer to the official documentation for NormqPCR for more information. (https://bioconductor.org/packages/release/bioc/html/NormqPCR.html)
sessionInfo()
#> R Under development (unstable) (2026-01-15 r89304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SummarizedExperiment_1.41.1 Biobase_2.71.0
#> [3] GenomicRanges_1.63.1 Seqinfo_1.1.0
#> [5] IRanges_2.45.0 S4Vectors_0.49.0
#> [7] BiocGenerics_0.57.0 generics_0.1.4
#> [9] MatrixGenerics_1.23.0 matrixStats_1.5.0
#> [11] OAtools_0.99.9
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 dir.expiry_1.19.0 xfun_0.56
#> [4] bslib_0.10.0 ggplot2_4.0.2 htmlwidgets_1.6.4
#> [7] lattice_0.22-9 vctrs_0.7.1 tools_4.6.0
#> [10] rgl_1.3.34 parallel_4.6.0 tibble_3.3.1
#> [13] DEoptimR_1.1-4 pkgconfig_2.0.3 Matrix_1.7-4
#> [16] RColorBrewer_1.1-3 S7_0.2.1 readxl_1.4.5
#> [19] lifecycle_1.0.5 compiler_4.6.0 farver_2.1.2
#> [22] stringr_1.6.0 minpack.lm_1.2-4 qpcR_1.4-2
#> [25] NormqPCR_1.57.0 janitor_2.2.1 snakecase_0.11.1
#> [28] htmltools_0.5.9 sass_0.4.10 yaml_2.3.12
#> [31] pillar_1.11.1 jquerylib_0.1.4 tidyr_1.3.2
#> [34] MASS_7.3-65 DelayedArray_0.37.0 cachem_1.1.0
#> [37] abind_1.4-8 robustbase_0.99-7 basilisk_1.23.0
#> [40] tidyselect_1.2.1 digest_0.6.39 stringi_1.8.7
#> [43] dplyr_1.2.0 purrr_1.2.1 fastmap_1.2.0
#> [46] grid_4.6.0 cli_3.6.5 SparseArray_1.11.10
#> [49] magrittr_2.0.4 base64enc_0.1-6 S4Arrays_1.11.1
#> [52] dichromat_2.0-0.1 withr_3.0.2 rappdirs_0.3.4
#> [55] filelock_1.0.3 scales_1.4.0 lubridate_1.9.5
#> [58] timechange_0.4.0 rmarkdown_2.30 XVector_0.51.0
#> [61] otel_0.2.0 reticulate_1.45.0 cellranger_1.1.0
#> [64] png_0.1-8 evaluate_1.0.5 knitr_1.51
#> [67] ReadqPCR_1.57.0 rlang_1.1.7 Rcpp_1.1.1
#> [70] glue_1.8.0 jsonlite_2.0.0 R6_2.6.1