Analyzing OpenArray Gene Expression Data with OAtools

Introduction

Highly multiplex PCR platforms enable drastically greater throughput compared to traditional PCR methods, making them critical for pathogen detection and gene expression experiments in public health and clinical research environments. One such platform is ThermoFisher Scientific’s OpenArray Technology, which our lab uses to screen patient swabs for respiratory viral infections. With 3000+ qPCR reactions taking place on each OpenArray plate, the sheer volume of data produced requires an informatic solution for downstream analyses and reporting.

ThermoFisher provides the proprietary QuantStudio 12K Flex Software and the Relative Quantification Application for data analysis. However, no all-in-one open-source software exists to derive reportable results from raw fluorescence values measured on OpenArray.

To bridge the gap, OAtools streamlines the analysis process including data import, plotting and quality control, analysis, and communication through an R Markdown document. Our goal in sharing OAtools is to support open-source and shareable analyses of OpenArray gene expression data, especially in clinical research environments.

Installation

GitHub Install

To install the development version of OAtools directly from the uwvirology-ngs public GitHub, (https://github.com/uwvirology-ngs/OAtools) run the commands below in your R console:

# Install devtools from CRAN
if (!require("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install the development version of OAtools from the UW Virology NGS GitHub
devtools::install_github(
    repo = "uwvirology-ngs/OAtools", 
    dependencies = TRUE, 
    build_vignettes = TRUE
)

Bioconductor Install

Our lab plans to share this project as a Bioconductor package. Once available, this package may be installed from Bioconductor by running the following commands in the R console:

# install Bioconductor from CRAN
if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

# install OAtools from Bioconductor
BiocManager::install("OAtools")

Environment Setup

If following along with the vignette by running code blocks in the R console, attach the OAtools and SummarizedExperiment libraries to the environment. The SummarizedExperiment package should already be installed as a dependency.

# attach OAtools and SummarizedExperiment libraries
library(OAtools)
library(SummarizedExperiment)

If it is not installed, install dplyr with the following:

# install dplyr from CRAN
if (!require("dplyr", quietly = TRUE)) {
    install.packages("dplyr")
}

Workflow Overview

OAtools supports two core methods for analyzing OpenArray qPCR data:

1.) The first method is to simply carry over the relative threshold analysis native to the QuantStudio 12K Flex Software.

2.) The second method is to fit model curves to the fluorescence vs. cycle data for each well individually, then determine PCR results which depend on features of the curve.

Import to SummarizedExperiment

To start, we specify the file path to some example OpenArray qPCR run data, which is included with the package. This data is a sample from a gene expression experiment our lab, the University of Washington Virology Lab, ran on human nasal swabs to screen for respiratory pathogens.

For detailed documentation on the example data used in this vignette, run ?oa_gene_expression_1 in the R console with OAtools attached.

With the file path specified, we use the excelToSE() function to read the Excel file containing the run data and load it into a SummarizedExperiment object. SummarizedExperiment is an S4 Class from Bioconductor used to store matrix-like assay data and associated metadata. This function takes optional parameters for the number of header rows, which store run metadata, and the number of rows to skip before reading assay data, though the defaults will work in this case.

# save filepath to example OpenArray gene expression run data
path = system.file(
    "extdata", 
    "oa_gene_expression_1.xlsx", 
    package = "OAtools"
)

# transform the run data into a SummarizedExperiment
se <- excelToSE(excel_path = path)

Experiment Structure

Assay Matrices

With the data loaded into a SummarizedExperiment container, we can review its structure. The assay matrices store observed fluorescence values measured by the QuantStudio 12K Flex thermocycler by cycle number, where each column stores PCR reactions from through-holes on the OpenArray plate separately.

The excelToSE() function generates two assay matrices by default, which are fluo_normalized and fluo_reporter. The former refers to baseline-corrected, normalized fluorescence imported from the ‘Amplification Data’ tab, while the latter refers to the spectral contribution of the reporter dye imported from the ‘Multicomponent Data’ tab. We may access the assay matrix corresponding to the multicomponent data, for example, by invoking assays(se)$fluo_reporter in the R console, then coercing into a data frame with as.data.frame() as shown below.

Here we display a slice of the multicomponent fluorescence data from cycles 11 to 20, where the exponential phase of amplification tends to occur. The rows of each assay matrix denote the cycle number and the columns denote the individual PCR reactions on the OpenArray plate.

# retrieve the assay matrix and display a subset as a data frame
as.data.frame(assays(se)$fluo_reporter) |> 
    dplyr::select(well_2321:well_2386) |>
    dplyr::slice(11:20)

	well_2321	well_2322	well_2329	well_2330	well_2343	well_2344	well_2345	well_2346	well_2385	well_2386
cycle_11	6405.788	5483.835	7060.261	6067.686	3605.021	3631.051	2256.322	1690.796	6164.024	5435.064
cycle_12	6405.382	5482.771	7058.596	6073.878	3594.832	3638.393	2284.099	1705.556	6165.793	5464.090
cycle_13	6414.322	5482.930	7037.791	6073.881	3602.094	3640.851	2314.417	1735.094	6211.680	5526.001
cycle_14	6419.167	5476.461	7020.177	6056.675	3590.731	3628.519	2382.905	1796.593	6293.289	5586.283
cycle_15	6425.805	5465.984	7023.062	6048.876	3582.180	3636.152	2474.373	1891.953	6427.477	5707.974
cycle_16	6430.859	5459.897	7023.853	6065.785	3565.366	3641.555	2612.977	2027.775	6648.198	5944.312
cycle_17	6415.730	5457.025	7041.731	6057.306	3561.831	3638.955	2782.659	2207.963	6934.779	6269.943
cycle_18	6407.045	5462.144	7030.021	6053.231	3573.811	3633.493	2994.564	2414.271	7318.756	6666.104
cycle_19	6409.033	5473.986	7027.955	6045.461	3588.689	3634.740	3215.832	2629.131	7770.426	7119.726
cycle_20	6390.317	5471.049	7036.966	6051.801	3595.764	3622.620	3439.598	2813.328	8252.472	7611.985

Coldata

Well number, sample, gene, quality control metrics, and various metadata associated with each PCR reaction are stored as a DataFrame accessible with colData(se). The Crt (relative threshold value) and amplification status determined by QuantStudio 12K Flex Software are saved here as well. The relative threshold value is analogous to a Ct value reported in a typical qPCR assay.

Here we display a subset of the column annotation DataFrame associated with the first 10 wells.

# retrieve the coldata and render a subset as a data frame
as.data.frame(colData(se)) |> 
    dplyr::select(
        well, sample_name:target_name, reporter, crt, amp_score:amp_status
    ) |> 
    dplyr::slice(1:10)

	well	sample_name	target_name	reporter	crt	amp_score	cq_conf	amp_status
well_2321	2321	Sample-101	RV_1of2_Vi99990016_po	FAM	NA	0.000000	0.0000000	No Amp
well_2322	2322	Sample-101	RV_1of2_Vi99990016_po	FAM	NA	0.000000	0.0000000	No Amp
well_2329	2329	Sample-101	RV_2of2_Vi99990017_po	FAM	NA	0.000000	0.0000000	No Amp
well_2330	2330	Sample-101	RV_2of2_Vi99990017_po	FAM	NA	0.000000	0.0000000	No Amp
well_2343	2343	Sample-101	S. pneumoniae_Ba06439619_s1	FAM	NA	0.000000	0.0000000	No Amp
well_2344	2344	Sample-101	S. pneumoniae_Ba06439619_s1	FAM	NA	0.000000	0.0000000	No Amp
well_2345	2345	Sample-101	RNAse_P_Pa04930436_g1	FAM	14.98677	1.458653	0.8907495	Amp
well_2346	2346	Sample-101	RNAse_P_Pa04930436_g1	FAM	14.94592	1.447588	0.8192336	Amp
well_2385	2385	Sample-102	RV_1of2_Vi99990016_po	FAM	15.99536	1.629182	0.9614623	Amp
well_2386	2386	Sample-102	RV_1of2_Vi99990016_po	FAM	15.90011	1.633002	0.9618700	Amp

Rowdata

By contrast, the row annotation DataFrame stores only the cycle numbers, which is less interesting by itself.

# retrieve the rowdata and render a subset as a data frame
as.data.frame(rowData(se)) |> 
    dplyr::slice(1:10)

	cycle
cycle_1	1
cycle_2	2
cycle_3	3
cycle_4	4
cycle_5	5
cycle_6	6
cycle_7	7
cycle_8	8
cycle_9	9
cycle_10	10

Metadata

Finally, the file path of the imported data and run-level metadata are stored in a list accessible with metadata(se). Here we display the run information.

# render the experiment level metadata
metadata(se)$run_info

Field	Data
Barcode	ABC123
Block Type	OpenArray Block
Chemistry	TAQMAN
Date Created	04-29-2025 09:39:53 AM PDT
Date Modified	01-06-2026 16:43:39 PM PST
Experiment File Name	C:123.eds
Experiment Name	ABC123
Experiment Run Start Time	04-29-2025 09:42:30 AM PDT
Experiment Run Stop Time	04-29-2025 11:50:35 AM PDT
Experiment Type	Gene Expression
Instrument Name	987654321
Instrument Serial Number	987654321
Instrument Type	QuantStudio 12K Flex
Passive Reference	NA
Quantification Cycle Method	Crt
Signal Smoothing On	true
Stage/ Cycle where Analysis is performed	Stage 2, Step 3

Model data generated in downstream curve-fitting steps is saved to the experiment metadata as well.

Reporting Results

If we intend to use the relative threshold analysis native to the instrument software, we can create a run result report straight away. We can call the generateReport() function directly on the Summarized Experiment, which then dynamically generates a .html document summarizing the results.

Note that the package kableExtra is required to generate the PCR report. If it is not installed, install with:

# install kableExtra from CRAN
if (!require("kableExtra", quietly = TRUE)) {
    install.packages("kableExtra")
}

The following code block generates the PCR report assuming that data has been loaded in using excelToSE(). Note that this code chunk is not wired to execute when constructing the vignette as generateReport() calls code to save the report directly to the user’s machine, though it defaults to a temporary file.

# generate an HTML report from the run data
generateReport(se = se)

At present, this report pulls from the analysis provided by the QuantStudio 12K Flex software, though an option to build the report from data generated by the curve-fitting method is planned for future development.

Analysis by Logistic Regression

The second method of data analysis, as mentioned in the overview, is to fit model curves to each PCR reaction curve individually, then use computed features to differentiate positive and negative PCR results. In particular, the computeModels() package function calls an algorithm from SciPy and attempts to fit 5-parameter logistic regressions to the fluorescence curves. The 5PL model is implemented for its favorable ability to handle asymmetries common to amplification curves.

This approach has the advantages of transparency and reproducibility, circumventing the need to rely on propriety algorithms to interpret PCR results.

Data Import

Here we start the data analysis again from the point where we have exported the gene expression experiment from QuantStudio 12K Flex software. As in the previous example, we load the experiment into a SummarizedExperiment object.

# clear the environment
rm(list = ls())

# save filepath to example OpenArray gene expression run data
path = system.file(
    "extdata", 
    "oa_gene_expression_1.xlsx", 
    package = "OAtools"
)

# transform the run data into a SummarizedExperiment
se <- excelToSE(excel_path = path)

Optimizing Models to PCR Curves

Next, we run the optimizer and attempt to fit a 5PL model curve to each PCR reaction on the OpenArray plate. As mentioned above, OAtools exports the computeModels() function, which iterates over the experiment and does this programmatically. computeModels() then saves the parameters of each model curve, along with features computed from them, as a list in the experiment metadata. If the fluo_reporter assay matrix is used, for example, the models would be accessible with metadata(se)$fluo_reporter_models.

Under the hood, computeModels() invokes a thin R wrapper, runFitCurve(), around the internal fit_curve() function implemented in python3. computeModels() handles the logic for iterating over the OpenArray plate, while fit_curve() interfaces with SciPy and actually runs the optimization, returning a model curve when provided PCR data (observed fluorescence vs. cycle number) for a single reaction.

A linear_threshold parameter is optionally provided when calling computeModels(). PCR curves with overall change in fluorescence below this threshold are marked as negative and fit to linear models instead. The rationale is that such PCR curves are trivially negative for the amplicon of interest, but add a great deal of compute time to the optimization process and are better filtered out.

# optimize model curves to each PCR reaction
se <- computeModels(
    se = se,
    assay_name = "fluo_reporter"
)
#> Using Python: /home/biocbuild/.pyenv/versions/3.12.12/bin/python3.12
#> Creating virtual environment '/var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env' ...
#> + /home/biocbuild/.pyenv/versions/3.12.12/bin/python3.12 -m venv /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env
#> Done!
#> Installing packages: pip, wheel, setuptools
#> + /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env/bin/python -m pip install --upgrade pip wheel setuptools
#> Installing packages: 'numpy==2.4.1', 'pandas==2.3.3', 'scipy==1.17.0'
#> + /var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env/bin/python -m pip install --upgrade --no-user 'numpy==2.4.1' 'pandas==2.3.3' 'scipy==1.17.0'
#> Virtual environment '/var/cache/basilisk/1.23.0/OAtools/0.99.9/conda_env' successfully created.

# display an example model, fit to the PCR reaction in well 2345
metadata(se)$fluo_reporter_models$well_2345
#> $regression_type
#> [1] "5pl"
#> 
#> $parameters
#> $parameters$A
#> [1] 2235.61
#> 
#> $parameters$D
#> [1] 6056.299
#> 
#> $parameters$B
#> [1] 0.1653346
#> 
#> $parameters$C
#> [1] 7.534664
#> 
#> $parameters$S
#> [1] 10
#> 
#> 
#> $r_squared
#> [1] 0.9995366
#> 
#> $x_obs
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
#> 
#> $y_obs
#>  [1] 2264.415 2264.059 2257.542 2249.905 2250.833 2244.891 2244.586 2260.520
#>  [9] 2264.602 2254.724 2256.322 2284.099 2314.417 2382.905 2474.373 2612.977
#> [17] 2782.659 2994.564 3215.832 3439.598 3664.946 3877.989 4076.521 4260.230
#> [25] 4445.077 4614.959 4791.316 4936.122 5061.273 5197.476 5323.698 5437.324
#> [33] 5541.725 5625.767 5684.772 5746.419 5803.184 5839.543 5865.371 5872.880
#> 
#> $y_pred
#>  [1] 2235.615 2235.624 2235.655 2235.742 2235.980 2236.579 2237.985 2241.052
#>  [9] 2247.255 2258.911 2279.277 2312.449 2362.977 2435.220 2532.592 2656.869
#> [17] 2807.778 2982.940 3178.175 3388.056 3606.569 3827.741 4046.141 4257.204
#> [25] 4457.395 4644.221 4816.153 4972.485 5113.171 5238.659 5349.745 5447.448
#> [33] 5532.905 5607.300 5671.805 5727.544 5775.568 5816.843 5852.242 5882.549
#> 
#> $delta_y
#> [1] 3597.259
#> 
#> $x_midpoint
#> [1] 23.46745
#> 
#> $y_midpoint
#> [1] 4145.955
#> 
#> $slope_midpoint
#> [1] 211.5127

Deriving PCR Results from the Model

Finally, we can leverage the model data we’ve generated to derive PCR results. OAtools exports the determinePCRResults() function, which we can invoke on our SummarizedExperiment to generate such results. determinePCRResults() requires as input an Excel key. This key associates each gene target with threshold values for model features used to separate positive and negative PCR results. In particular, values are specified for the cycle threshold, midpoint slope, and overall change-in-fluorescence.

Different thresholds may be specified for distinct target genes to account for differences in primer efficiency across assays on the same OpenArray plate.

For documentation and details on the key, run ?OAtools::target_threshold_key in the R console.

determinePCRResults() computes PASS/FAIL values for the cycle threshold, midpoint slope, and change-in-fluorescence, marking PCR reactions which pass on all accounts as positive and all others as negative. These results are saved to the column annotation DataFrame.

# save filepath to target-threshold-key
key_path = system.file(
    "extdata", 
    "target_threshold_key.xlsx", 
    package = "OAtools"
)

# assign a PCR result according to the key
se <- determinePCRResults(
    se = se, 
    key_path = key_path
)

# render a snapshot including the logic for determining the result
as.data.frame(SummarizedExperiment::colData(se)) |> 
    dplyr::select(
        well, crt, midpoint_slope, delta_fluo, 
        crt_threshold, slope_threshold, delta_threshold, result
    ) |> 
    dplyr::slice_head(n = 10)

well	crt	midpoint_slope	delta_fluo	crt_threshold	slope_threshold	delta_threshold	result
2321	NA	0.3555216	12.52458	35	75	750	negative
2322	NA	1.2217160	49.51595	35	75	750	negative
2329	NA	-1.7810606	-62.98893	35	75	750	negative
2330	NA	-0.9223582	-33.51660	35	75	750	negative
2343	NA	-2.5574979	-93.83805	NA	100	1000	negative
2344	NA	-1.4442550	-47.09253	NA	100	1000	negative
2345	14.98677	211.5126506	3597.25928	NA	75	1000	positive
2346	14.94592	207.3108679	3671.44303	NA	75	1000	positive
2385	15.99536	482.0642170	7382.08024	35	75	750	positive
2386	15.90011	502.0597006	7440.44922	35	75	750	positive

Interoperability

NormqPCR

OAtools supports interoperability with the NormqPCR package from Bioconductor, which provides functions to normalize RT-qPCR data from gene expression platforms like OpenArray. NormqPCR reads from the qPCRBatch class object, which requires the ReadqPCR package. We can set up our environment as below:

# Install ReadqPCR from Bioconductor
if (!require("ReadqPCR", quietly = TRUE)) {
    BiocManager::install("ReadqPCR")
}

# Install NormqPCR from Bioconductor
if (!require("NormqPCR", quietly = TRUE)) {
    BiocManager::install("NormqPCR")
}

Then, to convert the SummarizedExperiment container into a NormqPCR-friendly instance of the qPCRBatch class, simply invoke the seToQPCRBatch() function on the SummarizedExperiment container. With our qPCRBatch object, we can, for example, normalize the cycle thresholds against a housekeeping gene.

# convert SummarizedExperiment container to qPCRBatch
qpcr <- seToQPCRBatch(se)

# choose housekeeping gene (in this case, the RNAse P control)
housekeeping_gene = "RNAse_P_Pa04930436_g1"

# run delta-Cq calculation
norm <- NormqPCR::deltaCq(
    qPCRBatch = qpcr,
    hkgs = housekeeping_gene
)

# display expression matrix
as.data.frame(Biobase::exprs(norm)) |> 
    dplyr::select(`Pos_Control_A`:`Sample-106`) |> 
    knitr::kable(digits = 2)

	Pos_Control_A	Sample-101	Sample-102	Sample-103	Sample-104	Sample-105	Sample-106
RNAse_P_Pa04930436_g1	0.00	0	0.00	0	0.00	0.00	0.00
RV_1of2_Vi99990016_po	4.12	NaN	0.53	NaN	14.39	NaN	-1.56
RV_2of2_Vi99990017_po	2.27	NaN	0.66	NaN	13.42	NaN	-2.46
S. pneumoniae_Ba06439619_s1	10.19	NaN	NaN	NaN	NaN	0.74	NaN

Please refer to the official documentation for NormqPCR for more information. (https://bioconductor.org/packages/release/bioc/html/NormqPCR.html)

Session Info

sessionInfo()
#> R Under development (unstable) (2026-01-15 r89304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SummarizedExperiment_1.41.1 Biobase_2.71.0             
#>  [3] GenomicRanges_1.63.1        Seqinfo_1.1.0              
#>  [5] IRanges_2.45.0              S4Vectors_0.49.0           
#>  [7] BiocGenerics_0.57.0         generics_0.1.4             
#>  [9] MatrixGenerics_1.23.0       matrixStats_1.5.0          
#> [11] OAtools_0.99.9             
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6        dir.expiry_1.19.0   xfun_0.56          
#>  [4] bslib_0.10.0        ggplot2_4.0.2       htmlwidgets_1.6.4  
#>  [7] lattice_0.22-9      vctrs_0.7.1         tools_4.6.0        
#> [10] rgl_1.3.34          parallel_4.6.0      tibble_3.3.1       
#> [13] DEoptimR_1.1-4      pkgconfig_2.0.3     Matrix_1.7-4       
#> [16] RColorBrewer_1.1-3  S7_0.2.1            readxl_1.4.5       
#> [19] lifecycle_1.0.5     compiler_4.6.0      farver_2.1.2       
#> [22] stringr_1.6.0       minpack.lm_1.2-4    qpcR_1.4-2         
#> [25] NormqPCR_1.57.0     janitor_2.2.1       snakecase_0.11.1   
#> [28] htmltools_0.5.9     sass_0.4.10         yaml_2.3.12        
#> [31] pillar_1.11.1       jquerylib_0.1.4     tidyr_1.3.2        
#> [34] MASS_7.3-65         DelayedArray_0.37.0 cachem_1.1.0       
#> [37] abind_1.4-8         robustbase_0.99-7   basilisk_1.23.0    
#> [40] tidyselect_1.2.1    digest_0.6.39       stringi_1.8.7      
#> [43] dplyr_1.2.0         purrr_1.2.1         fastmap_1.2.0      
#> [46] grid_4.6.0          cli_3.6.5           SparseArray_1.11.10
#> [49] magrittr_2.0.4      base64enc_0.1-6     S4Arrays_1.11.1    
#> [52] dichromat_2.0-0.1   withr_3.0.2         rappdirs_0.3.4     
#> [55] filelock_1.0.3      scales_1.4.0        lubridate_1.9.5    
#> [58] timechange_0.4.0    rmarkdown_2.30      XVector_0.51.0     
#> [61] otel_0.2.0          reticulate_1.45.0   cellranger_1.1.0   
#> [64] png_0.1-8           evaluate_1.0.5      knitr_1.51         
#> [67] ReadqPCR_1.57.0     rlang_1.1.7         Rcpp_1.1.1         
#> [70] glue_1.8.0          jsonlite_2.0.0      R6_2.6.1