%\VignetteIndexEntry{DEGreport} %\VignetteKeywords{DifferentialExpression, Visualization, RNASeq, ReportWriting} %\VignetteEngine{knitr::knitr} \documentclass{article} \usepackage[utf8]{inputenc} <>= library("knitr") opts_chunk$set(tidy=FALSE, fig.width=9,fig.height=5, message=FALSE) @ <>= BiocStyle::latex() @ \title{DEGreport } \author{Lorena Pantano} \date{Modified: 2 July, 2016. Compiled: \today} \begin{document} maketitle \tableofcontents \newpage <>= library(DEGreport) data(humanSexDEedgeR) library(edgeR) @ \section{General QC figures from DE analysis} We are going to do a differential expression analysis with edgeR. We have an object that is comming from the edgeR package. It countains a gene count matrix for 85 TSI HapMap individuals, and the gender information. With that, we are going to apply the `glmFit` function to get genes differentially expressed between males and females. <>= des<-humanSexDEedgeR$design fit <- glmFit(humanSexDEedgeR,des) lrt <- glmLRT(fit) tab<-cbind(lrt$table,p.adjust(lrt$table$PValue,method="BH")) detags <- rownames(tab[tab[,5]<=0.1,]) plotSmear(humanSexDEedgeR, de.tags=detags) @ We need to extract the experiment design data.frame where the condition is Male or Female. <>= counts<-cpm(humanSexDEedgeR,log=FALSE) g1<-colnames(counts)[1:41] g2<-colnames(counts)[42:85] design<-data.frame(condition=sub("1","Male",sub("0","Female",des[,2])), other=1, row.names=colnames(counts)) @ We are getting the chromosome information for each gene. This way we can colour genes according autosomic,X or Y chromosomes. <>= data(geneInfo) @ p-value distribution gives an idea on how well you model is capturing the input data and as well whether it could be some problem for some set of genes. In general, you expect to have a flat distribution with peaks at 0 and 1. In this case, we add the mean count information to check if any set of genes are enriched in any specific p-value range. <>= degMean(lrt$table$PValue, counts) @ The same idea can be done with the gene variation. <>= degVar(lrt$table$PValue, counts) @ Variation (dispersion) and average expresion relationship shouldn't be a factor among the differentialy expressed genes. When plotting average mean and standard desviation, significant genes should be randomly distributed. <>= degMV(humanSexDEedgeR$samples$group, lrt$table$PValue, counts) @ In this case, it would be good to look at the ones that are totally outside the expected correlation. You can put this tree plots together using \Rcode{degQC}. <>= degQC(lrt$table$PValue, counts, humanSexDEedgeR$samples$group) @ Other way to look at this is showing the mean count distribution among groups. <>= degMB(detags,g1,g2,counts) @ The same idea can be applied to gene variation. <>= degVB(detags,g1,g2,counts) @ Browsing gene expression can help to validate results or select some gene for donwstream analysis. Run the fowlling lines if you want to visualize your expression values by condition: <>= degObj(counts,design,"degObj.rda") library(shiny) shiny::runGitHub("lpantano/shiny", subdir="expression") @ \section{Report from DESeq2 analysis} In this section, we show how to use DESeq2 output to create a full report, including figures and tbale with top de-regulated genes, GO enrichment analysis and heatmaps and PCA plots. If you set \Rcode{path\_results}, different files will be saved there. <>= data(humanSexDEedgeR) library(DESeq2) idx <- c(1:10, 75:85) dse <- DESeqDataSetFromMatrix(humanSexDEedgeR$counts[1:5000, idx], humanSexDEedgeR$samples[idx,], design=~group) dse <- DESeq(dse) res <- results(dse) resreport <- degResults(dds=dse, name="test", org=NULL, do_go=FALSE, group="group", xs="group", path_results = NULL) @ Volcano plot using the output of DESeq2. It mainly needs data.frame with two columns (logFC and pVal). Specific genes can be plot using the option \Rcode{plot\_text} (subset of the previous data.frame with a 3rd column to be used to plot the gene name). <>= res$id <- row.names(res) show = as.data.frame(res[1:2, c("log2FoldChange", "padj", "id")]) degVolcano(as.data.frame(res[,c("log2FoldChange", "padj")]), plot_text = show) @ Plot top genes coloring by group. Very useful for experiments with nested groups. `xs` can be `time` or `WT`/`KO`, and `group` can be `treated`/`untreated`. Another classification can be added, like `batch` that will plot points with different shapes. <>= degPlot(dds=dse, res = res, n = 6, xs = "group", group = "group") @ \section{Detect patterns of expression} In this section, we show how to detect pattern of expression. Mainly useful when data is a time course experiment. \Rfunction{degPatterns} needs a expression matrix, the design experiment and the column used to group samples. <>= ma = assay(rlog(dse))[row.names(res)[1:100],] res <- degPatterns(ma, as.data.frame(colData(dse)), time="group", col=NULL) @ \end{document}