--- title: "Grouping FTICR-MS data with xcms" author: - name: Joachim Bargsten - name: Johannes Rainer package: xcms output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Grouping FTICR-MS data with xcms} %\VignetteEngine{knitr::rmarkdown} %\VignetteKeywords{Mass Spectrometry, MS, Metabolomics, Bioinformatics} %\VignetteEncoding{UTF-8} %\VignetteDepends{xcms,msdata,MassSpecWavelet,BiocStyle,signal} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` # Introduction ```{r echo = FALSE, results = "hide", message = FALSE} library(BiocStyle) ``` This document describes how to use `r Biocpkg("xcms")` for the analysis of direct injection mass spec data, including peak detection, calibration and correspondence (grouping of peaks across samples). # Peak detection Prior to any other analysis step, peaks have to be identified in the mass spec data. In contrast to the typical metabolomics workflow, in which peaks are identified in the chromatographic (time) dimension, in direct injection mass spec data sets peaks are identified in the m/z dimension. `r Biocpkg("xcms")` uses functionality from the *MassSpecWavelet* package to identify such peaks. Below we load the required packages. For information on the parallel processing setup please see the *BiocParallel* vignette. ```{r load-libs, message = FALSE, results = "hide"} library(xcms) library(MassSpecWavelet) register(SerialParam()) ``` In this documentation we use an example data set from the `r Biocpkg("msdata")` package. Assuming that `r Biocpkg("msdata")` is installed, we locate the path of the package and load the data set. We create also a `data.frame` describing the experimental setup based on the file names. ```{r load-data, message = FALSE, results = "hide"} mzML_path <- system.file("fticr-mzML", package = "msdata") mzML_files <- list.files(mzML_path, recursive = TRUE, full.names = TRUE) ## We're subsetting to 2 samples per condition mzML_files <- mzML_files[c(1, 2, 6, 7)] ## Create a data.frame assigning samples to sample groups, i.e. ham4 and ham5. grp <- rep("ham4", length(mzML_files)) grp[grep(basename(mzML_files), pattern = "^HAM005")] <- "ham5" pd <- data.frame(filename = basename(mzML_files), sample_group = grp) ## Load the data. ham_raw <- readMSData(files = mzML_files, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk") ``` The data files are from *direct injection* mass spectrometry experiments, i.e. we have only a single spectrum available for each sample and no retention times. ```{r} ## Only a single spectrum with an *artificial* retention time is available ## for each sample rtime(ham_raw) ``` Peaks are identified within each spectrum using the *mass spec wavelet* method. ```{r msw} ## Define the parameters for the peak detection msw <- MSWParam(scales = c(1, 4, 9), nearbyPeak = TRUE, winSize.noise = 500, SNR.method = "data.mean", snthresh = 10) ham_prep <- findChromPeaks(ham_raw, param = msw) head(chromPeaks(ham_prep)) ``` # Calibration The `calibrate` method can be used to correct the m/z values of identified peaks. The currently implemented method requires identified peaks and a list of m/z values for known calibrants. The identified peaks m/z values are then adjusted based on the differences between the calibrants' m/z values and the m/z values of the closest peaks (within a user defined permitted maximal distance). Note that this method does presently only calibrate identified peaks, but not the original m/z values in the spectra. Below we demonstrate the `calibrate` method on one of the data files with artificially defined calibration m/z values. We first subset the data set to the first data file, extract the m/z values of 3 peaks and modify the values slightly. ```{r message = FALSE} ## Subset to the first file. first_file <- filterFile(ham_prep, file = 1) ## Extract 3 m/z values calib_mz <- chromPeaks(first_file)[c(1, 4, 7), "mz"] calib_mz <- calib_mz + 0.00001 * runif(1, 0, 0.4) * calib_mz + 0.0001 ``` Next we calibrate the data set using the previously defined *artificial* calibrants. We are using the `"edgeshift"` method for calibration that adjusts all peaks within the range of the m/z values of the calibrants using a linear interpolation and shifts all chromatographic peaks outside of that range by a constant factor (the difference between the lowest respectively largest calibrant m/z with the closest peak's m/z). Note that in a *real* use case, the m/z values would obviously represent known m/z of calibrants and would not be defined on the actual data. ```{r message = FALSE} ## Set-up the parameter class for the calibration prm <- CalibrantMassParam(mz = calib_mz, method = "edgeshift", mzabs = 0.0001, mzppm = 5) first_file_calibrated <- calibrate(first_file, param = prm) ``` To evaluate the calibration we plot below the difference between the adjusted and raw m/z values (y-axis) against the raw m/z values. ```{r calibrationresult, fig = TRUE, fig.width = 6, fig.height = 5, fig.align = "center"} diffs <- chromPeaks(first_file_calibrated)[, "mz"] - chromPeaks(first_file)[, "mz"] plot(x = chromPeaks(first_file)[, "mz"], xlab = expression(m/z[raw]), y = diffs, ylab = expression(m/z[calibrated] - m/z[raw])) ``` # Correspondence Correspondence aims to group peaks across samples to define the *features* (ions with the same m/z values). Peaks from single spectrum, direct injection MS experiments can be grouped with the *MZclust* method. Below we perform the correspondence analysis with the `groupChromPeaks` method using default settings. ```{r correspondence, message = FALSE, results = "hide"} ## Using default settings but define sample group assignment mzc_prm <- MzClustParam(sampleGroups = ham_prep$sample_group) ham_prep <- groupChromPeaks(ham_prep, param = mzc_prm) ``` Getting an overview of the performed processings: ```{r} ham_prep ``` The peak group information, i.e. the *feature* definitions can be accessed with the `featureDefinitions` method. ```{r} featureDefinitions(ham_prep) ``` Plotting the raw data for direct injection samples involves a little more processing than for LC/GC-MS data in which we can simply use the `chromatogram` method to extract the data. Below we extract the m/z-intensity pairs for the peaks associated with the first feature. We thus first identify the peaks for that feature and define their m/z values range. Using this range we can subsequently use the `filterMz` function to sub-set the full data set to the signal associated with the feature's peaks. On that object we can then call the `mz` and `intensity` functions to extract the data. ```{r feature1, fig = TRUE, fig.width = 6, fig.height = 4, fig.align = "center"} ## Get the peaks belonging to the first feature pks <- chromPeaks(ham_prep)[featureDefinitions(ham_prep)$peakidx[[1]], ] ## Define the m/z range mzr <- c(min(pks[, "mzmin"]) - 0.001, max(pks[, "mzmax"]) + 0.001) ## Subset the object to the m/z range ham_prep_sub <- filterMz(ham_prep, mz = mzr) ## Extract the mz and intensity values mzs <- mz(ham_prep_sub, bySample = TRUE) ints <- intensity(ham_prep_sub, bySample = TRUE) ## Plot the data plot(3, 3, pch = NA, xlim = range(mzs), ylim = range(ints), main = "FT01", xlab = "m/z", ylab = "intensity") ## Define colors cols <- rep("#ff000080", length(mzs)) cols[ham_prep_sub$sample_group == "ham5"] <- "#0000ff80" tmp <- mapply(mzs, ints, cols, FUN = function(x, y, col) { points(x, y, col = col, type = "l") }) ``` To access the actual intensity values of each feature in each sample the `featureValue` method can be used. The setting `value = "into"` tells the function to return the integrated signal for each peak (one representative peak) per sample. ```{r} feat_vals <- featureValues(ham_prep, value = "into") head(feat_vals) ``` `NA` is reported for features in samples for which no peak was identified at the feature's m/z value. In some instances there might still be a signal at the feature's position in the raw data files, but the peak detection failed to identify a peak. For these cases signal can be recovered using the `fillChromPeaks` method that integrates all raw signal at the feature's location. If there is no signal at that location an `NA` is reported. ```{r fillpeaks, message = FALSE} ham_prep <- fillChromPeaks(ham_prep, param = FillChromPeaksParam()) head(featureValues(ham_prep, value = "into")) ``` # Further analysis Further analysis, i.e. detection of features/metabolites with significantly different abundances, or PCA analyses can be performed on the feature matrix using functionality from other R packages, such as `r Biocpkg("limma")`. # Session information ```{r sessionInfo} sessionInfo() ```