Optimizing liquid chromatography coupled to mass spectrometry (LC–MS) methods presents a significant challenge. The ‘rawDiag’ package (Trachsel et al. 2018), accessible through rawDiag, streamlines method optimization by generating MS operator-specific diagnostic plots based on scan-level metadata. Tailored for use on the R shell or as a shiny application on the Orbitrap instrument PC, ‘rawDiag’ leverages rawrr (Kockmann and Panse 2021) for reading vendor proprietary instrument data. Developed, rigorously tested, and actively employed at the Functional Genomics Center Zurich ETHZ | UZH, ‘rawDiag’ stands as a robust solution in advancing LC–MS Orbitrap method optimization."
rawDiag 0.99.21
Figure 1: The octopussy rawDiag
package logo by Lilly van de Venn
Over the past two decades, liquid chromatography coupled to mass spectrometry (LC–MS) has evolved into the method of choice in the field of proteomics. (Cox and Mann 2011, @Mallick2010) During a typical LC–MS measurement, a complex mixture of analytes is separated by a liquid chromatography system coupled to a mass spectrometer (MS) through an ion source interface. This interface converts the analytes that elute from the chromatography system over time into a beam of ions. The MS records from this ion beam a series of mass spectra containing detailed information on the analyzed sample. (Savaryn, Toby, and Kelleher 2016) The resulting raw data consist of the mass spectra and their metadata, typically recorded in a vendor-specific binary format. During a measurement the mass spectrometer applies internal heuristics, which enables the instrument to adapt to sample properties, for example, sample complexity and amount of ions in near real time. Still, method parameters controlling these heuristics need to be set prior to the measurement. Optimal measurement results require a careful balancing of instrument parameters, but their complex interactions with each other make LC–MS method optimization a challenging task.
Here we present rawDiag, a platform-independent software tool implemented in the R language that supports LC–MS operators during the process of empirical method optimization. Our work builds on the ideas of the discontinued software rawMeat (VAST Scientific). Our application is currently tailored toward spectral data acquired on Thermo Fisher Scientific instruments (raw format), with a particular focus on Orbitrap (Zubarev and Makarov 2013) mass analyzers (Exactive or Fusion instruments). These instruments are heavily used in the field of bottom-up proteomics (Aebersold and Mann 2003) to analyze complex peptide mixtures derived from enzymatic digests of proteomes.
rawDiag is meant to run after MS acquisition, optimally as an interactive R shiny application, and produces a series of diagnostic plots visualizing the impact of method parameter choices on the acquired data across injections. If static reports are required then pdf files can be generated using rmarkdown. In this vignette, we present the usage of our tool.
rawDiag gains advantages from being part of the Bioconductor ecosystem, such as its ability to utilize the rawrr package and potentially extend its functionality through interaction with the Spectra infrastructure, particularly with the MsBackendRawFileReader.
rawDiag proides a wrapper function readRaw
using the
rawrr methods raw::readIndex
, rawrr::readTrailer
,
and rawrr::readChromatogram
to read proprietary mass spectrometer generated
data by invoking third-party managed methods through a system2
text connection
. See the stack below:
R>
|
text connection
|
system2
|
Mono Runtime |
Managed Assembly
(CIL/.NET code)
rawrr.exe |
ThermoFisher.CommonCore.*.dll |
In case you prefer to compile rawrr.exe
from C# source code, please install
the mono compiler and xbuild by installing the following Linux packages:
sudo apt-get install mono-mcs mono-xbuild
Otherwise, to execute the precompiled code, the following Linux packages are sufficient:
sudo apt-get install mono-runtime libmono-system-data4.0-cil -y
Running the rawrr.exe
will run out of the box.
If the native C# compiler is not available install mono from:
assemblies aka Common Intermediate Language bytecode - the download and install can be done on all platforms using the command:
rawDiag::checkRawrr
## function ()
## {
## if (isFALSE(requireNamespace("BiocManager", quietly = TRUE)))
## stop("exec", "install.packages('BiocManager')")
## if (isFALSE(requireNamespace("rawrr", quietly = TRUE)))
## stop("exec", "BiocManager::install('rawrr')")
## if (isFALSE(rawrr:::.checkRawFileReaderDLLs()))
## rawrr::installRawFileReaderDLLs()
## if (isFALSE(file.exists(rawrr:::.rawrrAssembly())))
## rawrr::installRawrrExe()
## TRUE
## }
## <bytecode: 0x5589a5346d28>
## <environment: namespace:rawDiag>
rawDiag::checkRawrr()
## [1] TRUE
for more information please read the INSTALL file in the rawrr package.
fetch example Orbitrap raw file from ExperimentHub’s tartare package.
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## see ?tartare and browseVignettes('tartare') for documentation
## loading from cache
## [1] "/home/pkgbuild/.cache/R/ExperimentHub/f9b8a45f5902f_4590.raw"
readRaw
- read Orbitrap raw fileread the instrument raw data by using the rawrr package.
## read Orbitrap meta data
rawfile |>
rawDiag::readRaw() -> x
## reading index for f9b8a45f5902f_4590.raw...
## determining ElapsedScanTimesec ...
## reading trailer LM m/z-Correction (ppm) ...
## reading trailer AGC ...
## reading trailer AGC PS Mode ...
## reading trailer FT Resolution ...
## reading trailer Ion Injection Time (ms) ...
## reading TIC ...
## reading BasePeakIntensity ...
## reading took23.293seconds
query and list all plot methods of the rawDiag package.
library(rawDiag)
ls("package:rawDiag") |>
grep(pattern = '^plot', value = TRUE) -> pm
pm |>
knitr::kable(col.names = "package:rawDiag plot functions")
package:rawDiag plot functions |
---|
plotChargeState |
plotCycleLoad |
plotCycleTime |
plotInjectionTime |
plotLockMassCorrection |
plotMassDistribution |
plotMzDistribution |
plotPrecursorHeatmap |
plotScanTime |
plotTicBasepeak |
apply all plot methods on the example data.
## plottingplotChargeStateusing methodtrellis...
## plottingplotCycleLoadusing methodtrellis...
## plottingplotCycleTimeusing methodtrellis...
## plottingplotInjectionTimeusing methodtrellis...
## plottingplotLockMassCorrectionusing methodtrellis...
## plottingplotMassDistributionusing methodtrellis...
## plottingplotMzDistributionusing methodtrellis...
## plottingplotPrecursorHeatmapusing methodtrellis...
## plottingplotScanTimeusing methodtrellis...
## plottingplotTicBasepeakusing methodtrellis...
## [[1]]
## [[1]][[1]]
##
##
## [[2]]
## [[2]][[1]]
## `geom_smooth()` using formula = 'y ~ x'
##
##
## [[3]]
## [[3]][[1]]
##
##
## [[4]]
## [[4]][[1]]
##
##
## [[5]]
## [[5]][[1]]
##
##
## [[6]]
## [[6]][[1]]
##
##
## [[7]]
## [[7]][[1]]
##
##
## [[8]]
## [[8]][[1]]
##
##
## [[9]]
## [[9]][[1]]
##
##
## [[10]]
## [[10]][[1]]
for more information on the plot methods and its application, please read the package man pages and the application examples in manuscript (Trachsel et al. 2018).
Aebersold, Ruedi, and Matthias Mann. 2003. “Mass Spectrometry-Based Proteomics.” Nature 422 (6928): 198–207. https://doi.org/10.1038/nature01511.
Cox, Jürgen, and Matthias Mann. 2011. “Quantitative, High-Resolution Proteomics for Data-Driven Systems Biology.” Annual Review of Biochemistry 80 (1): 273–99. https://doi.org/10.1146/annurev-biochem-061308-093216.
Kockmann, Tobias, and Christian Panse. 2021. “The rawrr R Package: Direct Access to Orbitrap Data and Beyond.” Journal of Proteome Research. https://doi.org/10.1021/acs.jproteome.0c00866.
Mallick, Parag, and Bernhard Kuster. 2010. “Proteomics: A Pragmatic Perspective.” Nature Biotechnology 28 (7): 695–709. https://doi.org/10.1038/nbt.1658.
Savaryn, John P., Timothy K. Toby, and Neil L. Kelleher. 2016. “A Researcher’s Guide to Mass Spectrometry‐based Proteomics.” PROTEOMICS 16 (18): 2435–43. https://doi.org/10.1002/pmic.201600113.
Trachsel, Christian, Christian Panse, Tobias Kockmann, Witold E. Wolski, Jonas Grossmann, and Ralph Schlapbach. 2018. “rawDiag: An R Package Supporting Rational LCMS Method Optimization for Bottom-up Proteomics.” Journal of Proteome Research 17 (8): 2908–14. https://doi.org/10.1021/acs.jproteome.8b00173.
Zubarev, Roman A., and Alexander Makarov. 2013. “Orbitrap Mass Spectrometry.” Analytical Chemistry 85 (11): 5288–96. https://doi.org/10.1021/ac4001223.
## R Under development (unstable) (2024-01-16 r85808)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rawDiag_0.99.21 tartare_1.17.0 ExperimentHub_2.11.1
## [4] AnnotationHub_3.11.1 BiocFileCache_2.11.1 dbplyr_2.4.0
## [7] BiocGenerics_0.49.1 BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 dplyr_1.1.4 farver_2.1.1
## [4] blob_1.2.4 filelock_1.0.3 Biostrings_2.71.2
## [7] bitops_1.0-7 fastmap_1.1.1 RCurl_1.98-1.14
## [10] promises_1.2.1 digest_0.6.34 mime_0.12
## [13] lifecycle_1.0.4 ellipsis_0.3.2 KEGGREST_1.43.0
## [16] RSQLite_2.3.5 magrittr_2.0.3 compiler_4.4.0
## [19] rlang_1.1.3 sass_0.4.8 tools_4.4.0
## [22] utf8_1.2.4 yaml_2.3.8 knitr_1.45
## [25] labeling_0.4.3 bit_4.0.5 curl_5.2.0
## [28] plyr_1.8.9 BiocParallel_1.37.0 withr_3.0.0
## [31] purrr_1.0.2 grid_4.4.0 stats4_4.4.0
## [34] fansi_1.0.6 xtable_1.8-4 colorspace_2.1-0
## [37] ggplot2_3.4.4 scales_1.3.0 cli_3.6.2
## [40] rmarkdown_2.25 crayon_1.5.2 generics_0.1.3
## [43] httr_1.4.7 reshape2_1.4.4 rawrr_1.11.11
## [46] DBI_1.2.1 cachem_1.0.8 stringr_1.5.1
## [49] splines_4.4.0 zlibbioc_1.49.0 parallel_4.4.0
## [52] AnnotationDbi_1.65.2 BiocManager_1.30.22 XVector_0.43.1
## [55] vctrs_0.6.5 Matrix_1.6-5 jsonlite_1.8.8
## [58] bookdown_0.37 IRanges_2.37.1 S4Vectors_0.41.3
## [61] bit64_4.0.5 magick_2.8.2 hexbin_1.28.3
## [64] jquerylib_0.1.4 glue_1.7.0 codetools_0.2-19
## [67] stringi_1.8.3 gtable_0.3.4 BiocVersion_3.19.1
## [70] later_1.3.2 GenomeInfoDb_1.39.6 munsell_0.5.0
## [73] tibble_3.2.1 pillar_1.9.0 rappdirs_0.3.3
## [76] htmltools_0.5.7 GenomeInfoDbData_1.2.11 R6_2.5.1
## [79] lattice_0.22-5 evaluate_0.23 shiny_1.8.0
## [82] Biobase_2.63.0 highr_0.10 png_0.1-8
## [85] memoise_2.0.1 httpuv_1.6.14 bslib_0.6.1
## [88] Rcpp_1.0.12 nlme_3.1-164 mgcv_1.9-1
## [91] xfun_0.42 pkgconfig_2.0.3