--- title: "phosphonormalizer: Pairwise normalization of phosphoproteomics data" author: "Sohrab Saraei, Tomi Suomi, Otto Kauko,Laura L. Elo" date: "October 11, 2016" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Pairwise normalization of phosphoproteomics data} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Introduction Global centering-based normalization is a commonly-used normalization approach in mass spectrometry (MS) -based label-free proteomics. It scales the peptide abundances to have the same median intensities, based on an assumption that the majority of abundances remain the same across the samples. However, especially in phosphoproteomics experiments, this assumption can introduce bias, as the enrichment of phosphopeptides during sample preparation can mask large unidirectional biological changes. Therefore, a novel method called pairwise normalization has been introduced that addresses this possible bias by utilizing phosphopeptides quantified in both enriched and non-enriched samples to calculate factors that mitigate the bias (Kauko et al. 2015). The phosphonormalizer package implements the pairwise normalization (Saraei et al., under review ). The phosphonormalizer package (Saraei et al. under review) normalizes the enriched samples in label-free MS-based phosphoproteomics using phosphopeptides that are present in both enriched and non-enriched data of the same samples. If there are no common phosphopeptides between the enriched and non-enriched data, then the normalization is not possible and an error is generated. ## Input data In order to use phosphonormalizer package, we assume that the experiment have been conducted on both enriched and non-enriched samples. These datasets must have the sequence, modification and abundance columns. The sequence and modification columns in the dataframe must be in character format and the abundance columns in numeric. The algorithm expects that the abundances are pre-normalized with median normalization (Kauko et al. 2015). This package also supports MSnSet data type from MSnbase package which is used in data preprocessing step of bioconductor mass spectrometry proteomics workflow (see more: https://www.bioconductor.org/help/workflows/proteomics/). ```{r eval=TRUE} #Load the library library(phosphonormalizer) #Enriched data overview head(enriched.rd) #Non-enriched data overview head(non.enriched.rd) ``` ## Pairwise normalization The normalization begins by loading the phosphonormalizer package. Here for demonstration, the data used is from "enriched.rd" and "non.enriched.rd" are available with the package. Boxplot of fold change distribution before and after pairwise normalization can also be generated by setting the plot parameter (look at the example). ## Installation To install this package, start R and enter: ```{r eval=FALSE} ## try http:// if https:// URLs are not supported source("https://bioconductor.org/biocLite.R") biocLite("phosphonormalizer") ``` ## Example ```{r eval=TRUE, fig.height = 4, fig.width = 6, fig.align = "center"} #Load the library library(phosphonormalizer) #Specify the column numbers of abundances in the original data.frame, #from both enriched and non-enriched runs samplesCols <- data.frame(enriched=3:17, non.enriched=3:17) #Specify the column numbers of sequence and modification in the original data.frame, #from both enriched and non-enriched runs modseqCols <- data.frame(enriched = 1:2, non.enriched = 1:2) #The samples and their technical replicates techRep <- factor(x = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)) #If the paramter plot.fc set, the corresponding plots of Sample fold changes is produced #Here, for demonstration, the fold change distributions are shown for samples 3 vs 1 plot.param <- list(control = c(1), samples = c(3)) #Call the function norm <- normalizePhospho(enriched = enriched.rd, non.enriched = non.enriched.rd, samplesCols = samplesCols, modseqCols = modseqCols, techRep = techRep, plot.fc = plot.param) ```