```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
```
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=10, fig.height=7, warning=FALSE, message=FALSE)
options(width=110)
```

<!--
%\VignetteIndexEntry{MSstatsBioData: Datasets of published biological studies with DDA or SRM experiments}
%\VignetteEngine{knitr::knitr}
-->

# MSstatsBioData: Datasets of published biological studies with DDA or SRM experiments

Package: MSstatsBioData   
Author: Meena Choi   
Date: October 2, 2017   

## Overview
This package provides the peak intensities datasets from seven biological investigations in DDA or SRM experiments. Experimental design, data acqusition and processing steps by spectral processing tools are as described in publications [1-5]. The datasets are formatted as MSstats required format (package `r Biocpkg("MSstats")`). For usage of the data for differential abundance analysis, please refer to the `r Biocpkg("MSstats")` vignette.

Overview of the datasets included in MSstatsBioData:

object | description
-------- | ----------
`DDA_cardio` | DDA dataset for cardiovascular disease study [1]
`SRM_crc_training` | The training set from a study for subjects with colorectal cancer [2]
`SRM_crc_validation` | The validation set from a study for subjects with colorectal cancer [2]
`SRM_mpm_training` | The training set from a study of subjects with malignant pleural mesothelioma(MPM) [3]
`SRM_mpm_validation` | The validation set from a study of subjects with malignant pleural mesothelioma(MPM) [3]
`SRM_ovarian` | SRM dataset for a study of subjects with ovarian cancer [4]
`SRM_yeast` | Time course investigation of central carbon metabolism of \emph{S. cerevisiae} [5]


## Data structure
All datasets in the package are represented in a data.frame with 10 columns, as `MSstats` rsequired format : 

column | description
-------- | ----------
`ProteinName` | Protein id.
`PeptideSequence` | Peptide sequence or modified peptide sequence
`PrecursorCharge` | Precursor charge
`FragmentIon` | fragment
`ProductCharge` | normalized microarray data, RNA-seq data or developmental effect score 
`IsotopeLabelType` | endogenous peptides (use L) or labeled reference peptides (use H)
`Condition` | indicates groups of interest
`BioReplicate` | a unique identifier for each biological replicate in the experiment
`Run` | identifier of a mass spectrometry run
`Intensity` | the quantified signal of a feature in a run without any transformation (the peak height or the peak of area under curve)


## Data manipulation 

#### Intensities are normalized in `SRM_crc_training` and `SRM_crc_validation` 
Two steps-normalization in log2 transformed intensity were already performed for these two datsets as described in manuscript. First they are normalized by equalized median using heavy reference peptides. Second, endogenous peptides are normalized by two standard proteins (AIAG and FETUA). After normalization, intensity value was came back to original scale from log2 transformation. These normalizations were performed by each dataset. User do not need extra normalization.


## Comments for some datasets

#### no missing in `DDA_cardio`
This DDA dataset had no missing values, unusally, because the procedure reported the background signal if a feature in a run was not detected.


## How to use dataset

Here one example (SRM_yeast) of differential abundance analysis by MSstats will be shown. Other example will be available in help file for each datasets.

### 1.Read the data.
```{r}
## load package
require(MSstatsBioData)

## require SRM_yeast data
data(SRM_yeast)
## look at first lines
head(SRM_yeast)
```

### 2. Load MSstats package and process the data
```{r}
## Example of using MSstats for differential abundance analysis.
require(MSstats)
input.proposed <- dataProcess(SRM_yeast, 
                            summaryMethod="TMP",
                            cutoffCensored="minFeature", 
                            censoredInt="0", 
                            MBimpute=TRUE,
                            maxQuantileforCensored=0.999)
```

```{r}
## set up the comparison that you want.
comparison1<-matrix(c(-1,1,0,0,0,0,0,0,0,0),nrow=1)
comparison2<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1)
comparison3<-matrix(c(-1,0,0,1,0,0,0,0,0,0),nrow=1)
comparison4<-matrix(c(-1,0,0,0,1,0,0,0,0,0),nrow=1)
comparison5<-matrix(c(-1,0,0,0,0,1,0,0,0,0),nrow=1)
comparison6<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1)
comparison7<-matrix(c(-1,0,0,0,0,0,0,1,0,0),nrow=1)
comparison8<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1)
comparison9<-matrix(c(-1,0,0,0,0,0,0,0,0,1),nrow=1)

comparison <- rbind(comparison1,comparison2,comparison3,
                    comparison4,comparison5,comparison6,
                    comparison7,comparison8,comparison9)
row.names(comparison) <- c("T2-T1","T3-T1","T4-T1",
                            "T5-T1","T6-T1","T7-T1",
                            "T8-T1","T9-T1","T10-T1")
```

```{r}
## test between comparison you set up.
output.comparison <- groupComparison(contrast.matrix=comparison, 
                            data=input.proposed)

## output.comparison$ComparisonResult include the result of testing.
head(output.comparison$ComparisonResult)
```


## References

[1] Clough, T. et al. (2009) Protein quantification in label-free LC-MS experiments. \emph{J. Proteome Res.}, 8, 5275–5284.

[2] Surinova, S. et al. (2015) Prediction of colorectal cancer diagnosis based on circulating plasma proteins. \emph{EMBO Mol. Med.}, 7, 1166–1178.

[3] Cerciello, F. et al. (2013) Identification of a seven glycopeptide signature for malignant pleural mesothelioma in human serum by selected reaction monitoring. \emph{Clin. Proteomics}, 10, 16.

[4] Huttenhain, R. et al. (2012) Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. \emph{Sci. Tansl. Med.}, 4, 142ra94.

[5] Picotti, P. et ak. (2009) Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. \emph{Cell}, 138, 795–806.