--- title: "MSstatsTMTPTM : A package for post translational modification (PTM) significance analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling" author: "Devon Kohler ()" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MSstatsTMTPTM : A package for post translational modification (PTM) significance analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=6, fig.height=6 ) ``` ```{r, message=FALSE, warning=FALSE} library(MSstatsTMTPTM) library(MSstatsTMT) ``` This vignette summarizes the functionalities and options of MSstastTMTPTM and provides a workflow example. - A set of tools for detecting differentially abundant post translational modifications (PTMs) and proteins in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling. - The types of experiment that MSstatsTMTPTM supports for metabolic labeling or iTRAQ experiments. LC-MS, SRM, DIA(SWATH) with label-free or labeled synthetic peptides can be analyzed with other R package, MSstatsPTM and MSstats. MSstatsTMTPTM includes the following two functions for data visualization and statistical testing: 1. Data visualization of PTM and global protein levels: `dataProcessPlotsTMTPMT` 2. Group comparison on PTM/protein quantification data: `groupComparisonTMTPTM` ## Installation To install this package, start R (version "4.0") and enter: ``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MSstatsTMTPTM") ``` ## 1. dataProcessPlotsTMTPTM() To illustrate the quantitative data and quality control of MS runs, dataProcessPlotsTMT takes the quantitative data from MSstatsTMT converter functions as input and generate two types of figures in pdf files as output : 1. Profile plot (specify "ProfilePlot" in option type), to identify the potential sources of variation for each protein; 2. Quality control plot (specify "QCPlot" in option type), to evaluate the systematic bias between MS runs. ### Arguments * `data.ptm` name of the data with PTM sites in protein name, which can be the output of MSstatsTMT converter functions. * `data.protein` name of the data with peptide level, which can be the output of MSstatsTMT converter functions. * `data.ptm.summarization` name of the data with ptm sites in protein-level name , which can be the output of the MSstatsTMT \code{\link{proteinSummarization}} function. * `data.protein.summarization` name of the data with protein-level, which can be the output of the MSstatsTMT \code{\link{proteinSummarization}} function. * `type` choice of visualization. "ProfilePlot" represents profile plot of log intensities across MS runs. "QCPlot" represents box plots of log intensities across channels and MS runs. * `ylimUp` upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses the upper limit as rounded off maximum of log2(intensities) after normalization + 3.. * `ylimDown` lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses 0.. * `x.axis.size` size of x-axis labeling for "Run" and "channel in Profile Plot and QC Plot. * `y.axis.size` size of y-axis labels. Default is 10. * `text.size` size of labels represented each condition at the top of Profile plot and QC plot. Default is 4. * `text.angle` angle of labels represented each condition at the top of Profile plot and QC plot. Default is 0. * `legend.size` size of legend above Profile plot. Default is 7. * `dot.size.profile` size of dots in Profile plot. Default is 2. * `ncol.guide` number of columns for legends at the top of plot. Default is 5. * `width` width of the saved pdf file. Default is 10. * `height` height of the saved pdf file. Default is 10. * `which.Protein` Protein list to draw plots. List can be names of Proteins or order numbers of Proteins. Default is "all", which generates all plots for each protein. For QC plot, "allonly" will generate one QC plot with all proteins. * `originalPlot` TRUE(default) draws original profile plots, without normalization. * `summaryPlot` TRUE(default) draws profile plots with protein summarization for each channel and MS run. * `address` the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "ProfilePlot.pdf" or "QCplot.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window. ### Example The raw dataset for both the PTM and Protein datasets are required for the plotting function. This can be the output of the MSstatsTMT converter functions: `PDtoMSstatsTMTFormat`, `SpectroMinetoMSstatsTMTFormat`, and `OpenMStoMSstatsTMTFormat`. Both the PTM and protein datasets must include the following columns: `ProteinName`, `PeptideSequence`, `Charge`, `PSM`, `Mixture`, `TechRepMixture`, `Run`, `Channel`, `Condition`, `BioReplicate`, and `Intensity`. ``` {r} # read in raw data files # raw.ptm <- read.csv(file="raw.ptm.csv", header=TRUE) # raw.protein <- read.csv(file="raw.protein.csv", header=TRUE) head(raw.ptm) head(raw.protein) ``` ```{r, results='hide', message=FALSE, warning=FALSE} # Run MSstatsTMT proteinSummarization function quant.msstats.ptm <- proteinSummarization(raw.ptm, method = "msstats", global_norm = TRUE, reference_norm = FALSE, MBimpute = TRUE) quant.msstats.protein <- proteinSummarization(raw.protein, method = "msstats", global_norm = TRUE, reference_norm = FALSE, MBimpute = TRUE) ``` ``` {r} head(quant.msstats.ptm) head(quant.msstats.protein) # Profile Plot dataProcessPlotsTMTPTM(data.ptm=raw.ptm, data.protein=raw.protein, data.ptm.summarization=quant.msstats.ptm, data.protein.summarization=quant.msstats.protein, type='ProfilePlot' ) # Quality Control Plot # dataProcessPlotsTMTPTM(data.ptm=ptm.input.pd, # data.protein=protein.input.pd, # data.ptm.summarization=quant.msstats.ptm, # data.protein.summarization=quant.msstats.protein, # type='QCPlot') ``` ### 3. groupComparisonTMTPTM() Tests for significant changes in PTM abundance adjusted for global protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model. ### Arguments * `data.ptm` : Name of the output of proteinSummarization function with PTM data. It should have columns named `Protein`, `TechRepMixture`, `Mixture`, `Run`, `Channel`, `Condition`, `BioReplicate`, `Abundance`. * `data.protein` : Name of the output of proteinSummarization function with Protein data. It should have columns named `Protein`, `TechRepMixture`, `Mixture`, `Run`, `Channel`, `Condition`, `BioReplicate`, `Abundance`. * `contrast.matrix` : Comparison between conditions of interests. 1) default is `pairwise`, which compare all possible pairs between two conditions. 2) Otherwise, users can specify the comparisons of interest. Based on the levels of conditions, specify 1 or -1 to the conditions of interests and 0 otherwise. The levels of conditions are sorted alphabetically. * `moderated` : If moderated = TRUE, then moderated t statistic will be calculated; otherwise, ordinary t statistic will be used. * `adj.method` : adjusted method for multiple comparison. 'BH` is default. ### Example ```{r message = FALSE, warning = FALSE} # test for all the possible pairs of conditions model.results.pairwise <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm, data.protein=quant.msstats.protein) names(model.results.pairwise) head(model.results.pairwise[[1]]) # Load specific contrast matrix #example.contrast.matrix <- read.csv(file="example.contrast.matrix.csv", header=TRUE) example.contrast.matrix # test for specified condition comparisons only model.results.contrast <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm, data.protein=quant.msstats.protein, contrast.matrix = example.contrast.matrix) names(model.results.contrast) head(model.results.contrast[[1]]) ``` ## Session information ```{r session} sessionInfo() ```