Introduction

Highly multiplex PCR platforms enable drastically greater throughput compared to traditional PCR methods, making them critical for pathogen detection and gene expression experiments in public health and clinical research environments. One such platform is ThermoFisher Scientific’s OpenArray Technology, which our lab uses to screen patient swabs for respiratory viral infections. With 3000+ qPCR reactions taking place on each OpenArray plate, the sheer volume of data produced requires an informatic solution for downstream analyses and reporting.

ThermoFisher provides the proprietary QuantStudio 12K Flex Software and the Relative Quantification Application for data analysis. However, no all-in-one open-source software exists to derive reportable results from raw fluorescence values measured on OpenArray.

To bridge the gap, OAtools streamlines the analysis process including data import, plotting and quality control, analysis, and communication through an R Markdown document. Our goal in sharing OAtools is to support open-source and shareable analyses of OpenArray gene expression data, especially in clinical research environments.

Installation

GitHub Install

To install the development version of OAtools directly from the uwvirology-ngs public GitHub, (https://github.com/uwvirology-ngs/OAtools) run the commands below in your R console:

# Install devtools from CRAN
if (!require("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install the development version of OAtools from the UW Virology NGS GitHub
devtools::install_github(
    repo = "uwvirology-ngs/OAtools", 
    dependencies = TRUE, 
    build_vignettes = TRUE
)

Bioconductor Install

Our lab plans to share this project as a Bioconductor package. Once available, this package may be installed from Bioconductor by running the following commands in the R console:

# install Bioconductor from CRAN
if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

# install OAtools from Bioconductor
BiocManager::install("OAtools")

Environment Setup

If following along with the vignette by running code blocks in the R console, attach the OAtools and SummarizedExperiment libraries to the environment. The SummarizedExperiment package should already be installed as a dependency.

# attach OAtools and SummarizedExperiment libraries
library(OAtools)
library(SummarizedExperiment)

If it is not installed, install dplyr with the following:

# install dplyr from CRAN
if (!require("dplyr", quietly = TRUE)) {
    install.packages("dplyr")
}

Workflow Overview

OAtools supports two core methods for analyzing OpenArray qPCR data:

1.) The first method is to simply carry over the relative threshold analysis native to the QuantStudio 12K Flex Software.

2.) The second method is to fit model curves to the fluorescence vs. cycle data for each well individually, then determine PCR results which depend on features of the curve.

Import to SummarizedExperiment

To start, we specify the file path to some example OpenArray qPCR run data, which is included with the package. This data is a sample from a gene expression experiment our lab, the University of Washington Virology Lab, ran on human nasal swabs to screen for respiratory pathogens.

For detailed documentation on the example data used in this vignette, run ?oa_gene_expression_1 in the R console with OAtools attached.

With the file path specified, we use the excelToSE() function to read the Excel file containing the run data and load it into a SummarizedExperiment object. SummarizedExperiment is an S4 Class from Bioconductor used to store matrix-like assay data and associated metadata. This function takes optional parameters for the number of header rows, which store run metadata, and the number of rows to skip before reading assay data, though the defaults will work in this case.

# save filepath to example OpenArray gene expression run data
path = system.file(
    "extdata", 
    "oa_gene_expression_1.xlsx", 
    package = "OAtools"
)

# transform the run data into a SummarizedExperiment
se <- excelToSE(excel_path = path)

Experiment Structure

Assay Matrices

With the data loaded into a SummarizedExperiment container, we can review its structure. The assay matrices store observed fluorescence values measured by the QuantStudio 12K Flex thermocycler by cycle number, where each column stores PCR reactions from through-holes on the OpenArray plate separately.

The excelToSE() function generates two assay matrices by default, which are fluo_normalized and fluo_reporter. The former refers to baseline-corrected, normalized fluorescence imported from the ‘Amplification Data’ tab, while the latter refers to the spectral contribution of the reporter dye imported from the ‘Multicomponent Data’ tab. We may access the assay matrix corresponding to the multicomponent data, for example, by invoking assays(se)$fluo_reporter in the R console, then coercing into a data frame with as.data.frame() as shown below.

Here we display a slice of the multicomponent fluorescence data from cycles 11 to 20, where the exponential phase of amplification tends to occur. The rows of each assay matrix denote the cycle number and the columns denote the individual PCR reactions on the OpenArray plate.

# retrieve the assay matrix and display a subset as a data frame
as.data.frame(assays(se)$fluo_reporter) |> 
    dplyr::select(well_2321:well_2386) |>
    dplyr::slice(11:20)
well_2321 well_2322 well_2329 well_2330 well_2343 well_2344 well_2345 well_2346 well_2385 well_2386
cycle_11 6405.788 5483.835 7060.261 6067.686 3605.021 3631.051 2256.322 1690.796 6164.024 5435.064
cycle_12 6405.382 5482.771 7058.596 6073.878 3594.832 3638.393 2284.099 1705.556 6165.793 5464.090
cycle_13 6414.322 5482.930 7037.791 6073.881 3602.094 3640.851 2314.417 1735.094 6211.680 5526.001
cycle_14 6419.167 5476.461 7020.177 6056.675 3590.731 3628.519 2382.905 1796.593 6293.289 5586.283
cycle_15 6425.805 5465.984 7023.062 6048.876 3582.180 3636.152 2474.373 1891.953 6427.477 5707.974
cycle_16 6430.859 5459.897 7023.853 6065.785 3565.366 3641.555 2612.977 2027.775 6648.198 5944.312
cycle_17 6415.730 5457.025 7041.731 6057.306 3561.831 3638.955 2782.659 2207.963 6934.779 6269.943
cycle_18 6407.045 5462.144 7030.021 6053.231 3573.811 3633.493 2994.564 2414.271 7318.756 6666.104
cycle_19 6409.033 5473.986 7027.955 6045.461 3588.689 3634.740 3215.832 2629.131 7770.426 7119.726
cycle_20 6390.317 5471.049 7036.966 6051.801 3595.764 3622.620 3439.598 2813.328 8252.472 7611.985

Coldata

Well number, sample, gene, quality control metrics, and various metadata associated with each PCR reaction are stored as a DataFrame accessible with colData(se). The Crt (relative threshold value) and amplification status determined by QuantStudio 12K Flex Software are saved here as well. The relative threshold value is analogous to a Ct value reported in a typical qPCR assay.

Here we display a subset of the column annotation DataFrame associated with the first 10 wells.

# retrieve the coldata and render a subset as a data frame
as.data.frame(colData(se)) |> 
    dplyr::select(
        well, sample_name:target_name, reporter, crt, amp_score:amp_status
    ) |> 
    dplyr::slice(1:10)
well sample_name target_name reporter crt amp_score cq_conf amp_status
well_2321 2321 Sample-101 RV_1of2_Vi99990016_po FAM NA 0.000000 0.0000000 No Amp
well_2322 2322 Sample-101 RV_1of2_Vi99990016_po FAM NA 0.000000 0.0000000 No Amp
well_2329 2329 Sample-101 RV_2of2_Vi99990017_po FAM NA 0.000000 0.0000000 No Amp
well_2330 2330 Sample-101 RV_2of2_Vi99990017_po FAM NA 0.000000 0.0000000 No Amp
well_2343 2343 Sample-101 S. pneumoniae_Ba06439619_s1 FAM NA 0.000000 0.0000000 No Amp
well_2344 2344 Sample-101 S. pneumoniae_Ba06439619_s1 FAM NA 0.000000 0.0000000 No Amp
well_2345 2345 Sample-101 RNAse_P_Pa04930436_g1 FAM 14.98677 1.458653 0.8907495 Amp
well_2346 2346 Sample-101 RNAse_P_Pa04930436_g1 FAM 14.94592 1.447588 0.8192336 Amp
well_2385 2385 Sample-102 RV_1of2_Vi99990016_po FAM 15.99536 1.629182 0.9614623 Amp
well_2386 2386 Sample-102 RV_1of2_Vi99990016_po FAM 15.90011 1.633002 0.9618700 Amp

Rowdata

By contrast, the row annotation DataFrame stores only the cycle numbers, which is less interesting by itself.

# retrieve the rowdata and render a subset as a data frame
as.data.frame(rowData(se)) |> 
    dplyr::slice(1:10)
cycle
cycle_1 1
cycle_2 2
cycle_3 3
cycle_4 4
cycle_5 5
cycle_6 6
cycle_7 7
cycle_8 8
cycle_9 9
cycle_10 10

Metadata

Finally, the file path of the imported data and run-level metadata are stored in a list accessible with metadata(se). Here we display the run information.

# render the experiment level metadata
metadata(se)$run_info
Field Data
Barcode ABC123
Block Type OpenArray Block
Chemistry TAQMAN
Date Created 04-29-2025 09:39:53 AM PDT
Date Modified 01-06-2026 16:43:39 PM PST
Experiment File Name C:123.eds
Experiment Name ABC123
Experiment Run Start Time 04-29-2025 09:42:30 AM PDT
Experiment Run Stop Time 04-29-2025 11:50:35 AM PDT
Experiment Type Gene Expression
Instrument Name 987654321
Instrument Serial Number 987654321
Instrument Type QuantStudio 12K Flex
Passive Reference NA
Quantification Cycle Method Crt
Signal Smoothing On true
Stage/ Cycle where Analysis is performed Stage 2, Step 3

Model data generated in downstream curve-fitting steps is saved to the experiment metadata as well.

Reporting Results

If we intend to use the relative threshold analysis native to the instrument software, we can create a run result report straight away. We can call the generateReport() function directly on the Summarized Experiment, which then dynamically generates a .html document summarizing the results.

Note that the package kableExtra is required to generate the PCR report. If it is not installed, install with:

# install kableExtra from CRAN
if (!require("kableExtra", quietly = TRUE)) {
    install.packages("kableExtra")
}

The following code block generates the PCR report assuming that data has been loaded in using excelToSE(). Note that this code chunk is not wired to execute when constructing the vignette as generateReport() calls code to save the report directly to the user’s machine, though it defaults to a temporary file.

# generate an HTML report from the run data
generateReport(se = se)

At present, this report pulls from the analysis provided by the QuantStudio 12K Flex software, though an option to build the report from data generated by the curve-fitting method is planned for future development.

Analysis by Logistic Regression

The second method of data analysis, as mentioned in the overview, is to fit model curves to each PCR reaction curve individually, then use computed features to differentiate positive and negative PCR results. In particular, the computeModels() package function calls an algorithm from SciPy and attempts to fit 5-parameter logistic regressions to the fluorescence curves. The 5PL model is implemented for its favorable ability to handle asymmetries common to amplification curves.

This approach has the advantages of transparency and reproducibility, circumventing the need to rely on propriety algorithms to interpret PCR results.

Data Import

Here we start the data analysis again from the point where we have exported the gene expression experiment from QuantStudio 12K Flex software. As in the previous example, we load the experiment into a SummarizedExperiment object.

# clear the environment
rm(list = ls())

# save filepath to example OpenArray gene expression run data
path = system.file(
    "extdata", 
    "oa_gene_expression_1.xlsx", 
    package = "OAtools"
)

# transform the run data into a SummarizedExperiment
se <- excelToSE(excel_path = path)

Optimizing Models to PCR Curves

Next, we run the optimizer and attempt to fit a 5PL model curve to each PCR reaction on the OpenArray plate. As mentioned above, OAtools exports the computeModels() function, which iterates over the experiment and does this programmatically. computeModels() then saves the parameters of each model curve, along with features computed from them, as a list in the experiment metadata. If the fluo_reporter assay matrix is used, for example, the models would be accessible with metadata(se)$fluo_reporter_models.

Under the hood, computeModels() invokes a thin R wrapper, runFitCurve(), around the internal fit_curve() function implemented in python3. computeModels() handles the logic for iterating over the OpenArray plate, while fit_curve() interfaces with SciPy and actually runs the optimization, returning a model curve when provided PCR data (observed fluorescence vs.  cycle number) for a single reaction.

A linear_threshold parameter is optionally provided when calling computeModels(). PCR curves with overall change in fluorescence below this threshold are marked as negative and fit to linear models instead. The rationale is that such PCR curves are trivially negative for the amplicon of interest, but add a great deal of compute time to the optimization process and are better filtered out.

# optimize model curves to each PCR reaction
se <- computeModels(
    se = se,
    assay_name = "fluo_reporter"
)
#> Using Python: /home/biocbuild/.pyenv/versions/3.12.12/bin/python3.12
#> Creating virtual environment '/var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env' ...
#> + /home/biocbuild/.pyenv/versions/3.12.12/bin/python3.12 -m venv /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env
#> Done!
#> Installing packages: pip, wheel, setuptools
#> + /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env/bin/python -m pip install --upgrade pip wheel setuptools
#> Installing packages: 'numpy==2.4.1', 'pandas==2.3.3', 'scipy==1.17.0'
#> + /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env/bin/python -m pip install --upgrade --no-user 'numpy==2.4.1' 'pandas==2.3.3' 'scipy==1.17.0'
#> Virtual environment '/var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env' successfully created.

# display an example model, fit to the PCR reaction in well 2345
metadata(se)$fluo_reporter_models$well_2345
#> $regression_type
#> [1] "5pl"
#> 
#> $parameters
#> $parameters$A
#> [1] 2235.61
#> 
#> $parameters$D
#> [1] 6056.299
#> 
#> $parameters$B
#> [1] 0.1653346
#> 
#> $parameters$C
#> [1] 7.534664
#> 
#> $parameters$S
#> [1] 10
#> 
#> 
#> $r_squared
#> [1] 0.9995366
#> 
#> $x_obs
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
#> 
#> $y_obs
#>  [1] 2264.415 2264.059 2257.542 2249.905 2250.833 2244.891 2244.586 2260.520
#>  [9] 2264.602 2254.724 2256.322 2284.099 2314.417 2382.905 2474.373 2612.977
#> [17] 2782.659 2994.564 3215.832 3439.598 3664.946 3877.989 4076.521 4260.230
#> [25] 4445.077 4614.959 4791.316 4936.122 5061.273 5197.476 5323.698 5437.324
#> [33] 5541.725 5625.767 5684.772 5746.419 5803.184 5839.543 5865.371 5872.880
#> 
#> $y_pred
#>  [1] 2235.615 2235.624 2235.655 2235.742 2235.980 2236.579 2237.985 2241.052
#>  [9] 2247.255 2258.911 2279.277 2312.449 2362.977 2435.220 2532.592 2656.869
#> [17] 2807.778 2982.940 3178.175 3388.056 3606.569 3827.741 4046.141 4257.204
#> [25] 4457.395 4644.221 4816.153 4972.485 5113.171 5238.659 5349.745 5447.448
#> [33] 5532.905 5607.300 5671.805 5727.544 5775.568 5816.843 5852.242 5882.549
#> 
#> $delta_y
#> [1] 3597.259
#> 
#> $x_midpoint
#> [1] 23.46745
#> 
#> $y_midpoint
#> [1] 4145.955
#> 
#> $slope_midpoint
#> [1] 211.5127

Deriving PCR Results from the Model

Finally, we can leverage the model data we’ve generated to derive PCR results. OAtools exports the determinePCRResults() function, which we can invoke on our SummarizedExperiment to generate such results. determinePCRResults() requires as input an Excel key. This key associates each gene target with threshold values for model features used to separate positive and negative PCR results. In particular, values are specified for the cycle threshold, midpoint slope, and overall change-in-fluorescence.

Different thresholds may be specified for distinct target genes to account for differences in primer efficiency across assays on the same OpenArray plate.

For documentation and details on the key, run ?OAtools::target_threshold_key in the R console.

determinePCRResults() computes PASS/FAIL values for the cycle threshold, midpoint slope, and change-in-fluorescence, marking PCR reactions which pass on all accounts as positive and all others as negative. These results are saved to the column annotation DataFrame.

# save filepath to target-threshold-key
key_path = system.file(
    "extdata", 
    "target_threshold_key.xlsx", 
    package = "OAtools"
)

# assign a PCR result according to the key
se <- determinePCRResults(
    se = se, 
    key_path = key_path
)

# render a snapshot including the logic for determining the result
as.data.frame(SummarizedExperiment::colData(se)) |> 
    dplyr::select(
        well, crt, midpoint_slope, delta_fluo, 
        crt_threshold, slope_threshold, delta_threshold, result
    ) |> 
    dplyr::slice_head(n = 10)
well crt midpoint_slope delta_fluo crt_threshold slope_threshold delta_threshold result
2321 NA 0.3555216 12.52458 35 75 750 negative
2322 NA 1.2217160 49.51595 35 75 750 negative
2329 NA -1.7810606 -62.98893 35 75 750 negative
2330 NA -0.9223582 -33.51660 35 75 750 negative
2343 NA -2.5574979 -93.83805 NA 100 1000 negative
2344 NA -1.4442550 -47.09253 NA 100 1000 negative
2345 14.98677 211.5126506 3597.25928 NA 75 1000 positive
2346 14.94592 207.3108679 3671.44303 NA 75 1000 positive
2385 15.99536 482.0642170 7382.08024 35 75 750 positive
2386 15.90011 502.0597006 7440.44922 35 75 750 positive

Interoperability

NormqPCR

OAtools supports interoperability with the NormqPCR package from Bioconductor, which provides functions to normalize RT-qPCR data from gene expression platforms like OpenArray. NormqPCR reads from the qPCRBatch class object, which requires the ReadqPCR package. We can set up our environment as below:

# Install ReadqPCR from Bioconductor
if (!require("ReadqPCR", quietly = TRUE)) {
    BiocManager::install("ReadqPCR")
}

# Install NormqPCR from Bioconductor
if (!require("NormqPCR", quietly = TRUE)) {
    BiocManager::install("NormqPCR")
}

Then, to convert the SummarizedExperiment container into a NormqPCR-friendly instance of the qPCRBatch class, simply invoke the seToQPCRBatch() function on the SummarizedExperiment container. With our qPCRBatch object, we can, for example, normalize the cycle thresholds against a housekeeping gene.

# convert SummarizedExperiment container to qPCRBatch
qpcr <- seToQPCRBatch(se)

# choose housekeeping gene (in this case, the RNAse P control)
housekeeping_gene = "RNAse_P_Pa04930436_g1"

# run delta-Cq calculation
norm <- NormqPCR::deltaCq(
    qPCRBatch = qpcr,
    hkgs = housekeeping_gene
)

# display expression matrix
as.data.frame(Biobase::exprs(norm)) |> 
    dplyr::select(`Pos_Control_A`:`Sample-106`) |> 
    knitr::kable(digits = 2)
Pos_Control_A Sample-101 Sample-102 Sample-103 Sample-104 Sample-105 Sample-106
RNAse_P_Pa04930436_g1 0.00 0 0.00 0 0.00 0.00 0.00
RV_1of2_Vi99990016_po 4.12 NaN 0.53 NaN 14.39 NaN -1.56
RV_2of2_Vi99990017_po 2.27 NaN 0.66 NaN 13.42 NaN -2.46
S. pneumoniae_Ba06439619_s1 10.19 NaN NaN NaN NaN 0.74 NaN

Please refer to the official documentation for NormqPCR for more information. (https://bioconductor.org/packages/release/bioc/html/NormqPCR.html)

Session Info

sessionInfo()
#> R Under development (unstable) (2026-01-15 r89304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SummarizedExperiment_1.41.1 Biobase_2.71.0             
#>  [3] GenomicRanges_1.63.1        Seqinfo_1.1.0              
#>  [5] IRanges_2.45.0              S4Vectors_0.49.0           
#>  [7] BiocGenerics_0.57.0         generics_0.1.4             
#>  [9] MatrixGenerics_1.23.0       matrixStats_1.5.0          
#> [11] OAtools_0.99.9             
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6        dir.expiry_1.19.0   xfun_0.56          
#>  [4] bslib_0.10.0        ggplot2_4.0.2       htmlwidgets_1.6.4  
#>  [7] lattice_0.22-9      vctrs_0.7.1         tools_4.6.0        
#> [10] rgl_1.3.34          parallel_4.6.0      tibble_3.3.1       
#> [13] DEoptimR_1.1-4      pkgconfig_2.0.3     Matrix_1.7-4       
#> [16] RColorBrewer_1.1-3  S7_0.2.1            readxl_1.4.5       
#> [19] lifecycle_1.0.5     compiler_4.6.0      farver_2.1.2       
#> [22] stringr_1.6.0       minpack.lm_1.2-4    qpcR_1.4-2         
#> [25] NormqPCR_1.57.0     janitor_2.2.1       snakecase_0.11.1   
#> [28] htmltools_0.5.9     sass_0.4.10         yaml_2.3.12        
#> [31] pillar_1.11.1       jquerylib_0.1.4     tidyr_1.3.2        
#> [34] MASS_7.3-65         DelayedArray_0.37.0 cachem_1.1.0       
#> [37] abind_1.4-8         robustbase_0.99-7   basilisk_1.23.0    
#> [40] tidyselect_1.2.1    digest_0.6.39       stringi_1.8.7      
#> [43] dplyr_1.2.0         purrr_1.2.1         fastmap_1.2.0      
#> [46] grid_4.6.0          cli_3.6.5           SparseArray_1.11.10
#> [49] magrittr_2.0.4      base64enc_0.1-6     S4Arrays_1.11.1    
#> [52] dichromat_2.0-0.1   withr_3.0.2         rappdirs_0.3.4     
#> [55] filelock_1.0.3      scales_1.4.0        lubridate_1.9.5    
#> [58] timechange_0.4.0    rmarkdown_2.30      XVector_0.51.0     
#> [61] otel_0.2.0          reticulate_1.45.0   cellranger_1.1.0   
#> [64] png_0.1-8           evaluate_1.0.5      knitr_1.51         
#> [67] ReadqPCR_1.57.0     rlang_1.1.7         Rcpp_1.1.1         
#> [70] glue_1.8.0          jsonlite_2.0.0      R6_2.6.1