--- title: "YARN: Robust Multi-Tissue RNA-Seq Preprocessing and Normalization" author: "Joseph N. Paulson & John Quackenbush" date: "`r Sys.Date()`" output: rmarkdown::pdf_document vignette: > %\VignetteIndexEntry{YARN: Robust Multi-Tissue RNA-Seq Preprocessing and Normalization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## YARN - Yet Another RNa-seq package The goal of yarn is to expedite large RNA-seq analyses using a combination of previously developed tools. Yarn is meant to make it easier for the user to perform accurate comparison of conditions by leveraging many Bioconductor tools and various statistical and normalization techniques while accounting for the large heterogeneity and sparsity found in very large RNA-seq experiments. ## Installation You can install yarn from github with: ```R if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("yarn") ``` ## Quick Introduction If you're here to grab the GTEx version 6.0 data then look no further than this [gist](https://gist.github.com/jnpaulson/8c2ccfb0185dc490ff72e51aef86678c) that uses yarn to download all the data and preprocess it for you. ## Preprocessing Below are a few of the functions we can use to preprocess a large RNA-seq experiment. We follow a particular procedure where we: 1. Filter poor quality samples 2. Merge samples of similar conditions for increased power 3. Filter genes while preserving tissue or group specificity 4. Normalize while accounting for global differences in tissue distribution We will make use of the `skin` dataset for examples. The `skin` dataset is a small sample of the full GTEx data that can be downloaded using the `downloadGTEx` function. The `skin` dataset looks like this: ```{r,echo=FALSE,include=FALSE} library(yarn) data(skin) ``` ```{r} skin ``` This is a basic workflow. Details will be fleshed out: 0. First always remember to have the library loaded. ```R library(yarn) ``` 1. Download the GTEx gene count data as an ExpressionSet object or load the sample skin dataset. For computational reasons we load the sample skin data instead of having the user download the ```R library(yarn) data(skin) ``` 2. Check mis-annotation of gender or other phenotypes using group-specific genes ```{r checkMisAnnotation} checkMisAnnotation(skin,"GENDER",controlGenes="Y",legendPosition="topleft") ``` 3. Decide what sub-groups should be merged ```{r checkTissuesToMerge} checkTissuesToMerge(skin,"SMTS","SMTSD") ``` 4. Filter lowly expressed genes ```{r filterGenes} skin_filtered = filterLowGenes(skin,"SMTSD") dim(skin) dim(skin_filtered) ``` Or group specific genes ```{r filter} tmp = filterGenes(skin,labels=c("X","Y","MT"),featureName = "chromosome_name") # Keep only the sex names tmp = filterGenes(skin,labels=c("X","Y","MT"),featureName = "chromosome_name",keepOnly=TRUE) ``` 5. Normalize in a tissue or group-aware manner ```{r density} plotDensity(skin_filtered,"SMTSD",main=expression('log'[2]*' raw expression')) skin_filtered = normalizeTissueAware(skin_filtered,"SMTSD") plotDensity(skin_filtered,"SMTSD",normalized=TRUE,main="Normalized") ``` ## Helper functions Other than `checkMisAnnotation` and `checkTissuesToMerge` we provide a few plotting function. We include, `plotCMDS`, `plotDensity`, `plotHeatmap`. `plotCMDS` - PCoA / Classical Multi-Dimensional Scaling of the most variable genes. ```{r} data(skin) res = plotCMDS(skin,pch=21,bg=factor(pData(skin)$SMTSD)) ``` `plotDensity` - Density plots colored by phenotype of choosing. Allows for inspection of global trend differences. ```{r} filtData = filterLowGenes(skin,"SMTSD") plotDensity(filtData,groups="SMTSD",legendPos="topleft") ``` `plotHeatmap` - Heatmap of the most variable genes. ```{r} library(RColorBrewer) tissues = pData(skin)$SMTSD heatmapColColors=brewer.pal(12,"Set3")[as.integer(factor(tissues))] heatmapCols = colorRampPalette(brewer.pal(9, "RdBu"))(50) plotHeatmap(skin,normalized=FALSE,log=TRUE,trace="none",n=10, col = heatmapCols,ColSideColors = heatmapColColors,cexRow = 0.25,cexCol = 0.25) ``` ## Information ```{r} sessionInfo() ```