%\VignetteIndexEntry{PECA: Probe-level Expression Change Averaging} %\VignetteDepends{PECA} %\VignetteKeywords{Preprocessing, statistics} \documentclass{article} \usepackage{cite, hyperref} \usepackage{graphicx} \parindent = 0pt \parskip = 6pt \title{PECA: Probe-level Expression Change Averaging} \author{Tomi Suomi} \begin{document} \setkeys{Gin}{width=1.0\textwidth, height=1.1\textwidth} \maketitle \begin{center} {\tt tomi.suomi@utu.fi} \end{center} \textnormal{\normalfont} \tableofcontents \newpage \section{Introduction} PECA determines differential gene/protein expression using directly the probe/peptide-level measurements from Affymetrix gene expression microarrays or proteomic datasets, instead of the common practice of using pre-calculated gene/protein-level values. An expression change between two groups of samples is first calculated for each measured probe/peptide. The gene/protein-level expression changes are then defined as medians over the probe/peptide-level changes. This is illustrated in fig \ref{fig:peca}. For more details about the probe-level expression change averaging (PECA) procedure, see Elo et al. (2005), Laajala et al. (2009) and Suomi et al. (2015). \begin{figure}[h]\centering \includegraphics[width=64mm,height=64mm]{diagram_peca.pdf} \caption{Probe-level expression change averaging.} \label{fig:peca} \end{figure} PECA calculates the probe/peptide-level expression changes using the ordinary or modified t-statistic. The ordinary t-statistic is calculated using the function rowttests in the Bioconductor genefilter package. The modified t-statistic is calculated using the linear modeling approach in the Bioconductor limma package. Both paired and unpaired tests are supported. The significance of an expression change is determined based on the analytical p-value of the gene-level test statistic. Unadjusted p-values are reported along with the corresponding p-values looked up from beta distribution taking into account the number of probes/peptide per gene/protein. The quality control and filtering of the data (e.g. based on low intensity or probe specificity) is left to the user. \section{Proteomics data} This is an example of tab-separated text file being used as an input. First row of the file contains the column names, which can be later used to select the sample groups for comparison (here from A1 to B3). First column should always contain the id (e.g. UniProt Accession) for which the results are summarized for. Also the column names should not start with a number or contain any special letters. <>= filePath <- system.file("extdata", "peptides.tsv", package="PECA") data <- read.csv(file=filePath, sep="\t") head(data) @ PECA can be run by loading the package and setting the sample names of groups to be compared. These should match the column names in the input text file. After this the PECA\_tsv function can be called using the input file path and groups as parameters. Note that the sample names provided by user are used to subset the data so that the input file may contain more columns (e.g. samples or other information) than those used for comparison. <>= library(PECA) group1 <- c("A1", "A2", "A3") group2 <- c("B1", "B2", "B3") results <- PECA_tsv(filePath, group1, group2) @ Results can then be viewed, stored to disk, or processed further using R. <>= head(results) @ \section{Affymetrix microarray data} First we load the example library SpikeIn containing the SpikeIn133 dataset that we can use for input. This package needs to be installed separately from Bioconductor. <>= library(SpikeIn) data(SpikeIn133) @ We subset the original dataset for our purposes, which are two groups with three replicates. First group contains indexes 1, 15, and 29. Second group contains indexes 2, 16, and 30. This subset is then used as input for PECA. <>= data <- SpikeIn133[,c(1,15,29,2,16,30)] results <- PECA_AffyBatch(normalize="true", affy=data) @ Results can then be viewed, stored to disk, or processed further using R. <>= head(results) @ \end{document}