--- title: "Quick Start" author: "Jianhong Ou, Jun Yu, Lihua Julie Zhu" output: html_document: theme: simplex toc: true toc_float: true toc_depth: 4 fig_caption: true date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('ChIPpeakAnno')`" vignette: > %\VignetteIndexEntry{ChIPpeakAnno Quick Start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE, results="hide", warning=FALSE} suppressPackageStartupMessages({ library(ChIPpeakAnno) library(EnsDb.Hsapiens.v75) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(org.Hs.eg.db) }) knitr::opts_chunk$set(warning=FALSE, message=FALSE) ``` # Four steps for peak annotation The purpose of this quick start is to introduce the four newly implemented functions, `toRanges`, `annoGO`, `annotatePeakInBatch`, and `addGeneIDs` in the new version of the **ChIPpeakAnno**. With those wrapper functions, the annotation of ChIP-Seq peaks becomes streamlined into four major steps: 1 Read peak data with `toGRanges` 2 Generate annotation data with `toGRanges` 3 Annotate peaks with `annotatePeakInBatch` 4 Add additional information with `addGeneIDs` Most of time user can use the default settings of the arguments of those functions. This makes the annotation pipeline straightforward and easy to use. Note that the version of the annotation data must match with the genome used for mapping because the coordinates may differ for different genome releases. For example, if you are using Mus_musculus.v103 for mapping, you'd best also use EnsDb.Mmusculus.v103 for annotation. For more information about how to prepare the annotation data, please refer ?getAnnotation. # Use case 1: Four steps to annotate peak data with **EnsDb** ## Step 1: Convert the peak data to `GRanges` with `toGRanges` ```{r quickStart} ## First, load the ChIPpeakAnno package library(ChIPpeakAnno) ``` ```{r import} path <- system.file("extdata", "Tead4.broadPeak", package="ChIPpeakAnno") peaks <- toGRanges(path, format="broadPeak") peaks[1:2] ``` ## Step 2: Prepare annotation data with `toGRanges` ```{r annotationData} library(EnsDb.Hsapiens.v75) annoData <- toGRanges(EnsDb.Hsapiens.v75) annoData[1:2] ``` ## Step 3: Annotate the peaks with `annotatePeakInBatch` ```{r annotate} ## keep the seqnames in the same style if needed seqlevelsStyle(peaks) <- seqlevelsStyle(annoData) ## do annotation by nearest TSS anno <- annotatePeakInBatch(peaks, AnnotationData=annoData) anno[1:2] # A pie chart can be used to demonstrate the overlap features of the peaks. pie1(table(anno$insideFeature)) ``` ## Step 4: Add additional annotation with `addGeneIDs` ```{r addIDs} library(org.Hs.eg.db) anno <- addGeneIDs(anno, orgAnn="org.Hs.eg.db", feature_id_type="ensembl_gene_id", IDs2Add=c("symbol")) head(anno) ``` # Use case 2: Annotate the peaks with promoters provided by **TxDb** This section demonstrates how to annotate the same peak data as in quick start 1 using a new annotation based on **TxDb** with `toGRanges`. ```{r} library(TxDb.Hsapiens.UCSC.hg19.knownGene) annoData <- toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene) annoData[1:2] seqlevelsStyle(peaks) <- seqlevelsStyle(annoData) ``` The same `annotatePeakInBatch` function is used to annotate the peaks using annotation data just created. This time we want the peaks within 2kb upstream and up to 300bp downstream of TSS within the gene body. ```{r} anno <- annotatePeakInBatch(peaks, AnnotationData=annoData, output="overlapping", FeatureLocForDistance="TSS", bindingRegion=c(-2000, 300)) anno$symbol <- xget(anno$feature, org.Hs.egSYMBOL) head(anno) ``` # Use case 3: Annotate the peaks in both sides with nearest transcription start sites within 5K bps. This section demonstrates the flexibility of the annotaition functions in the ChIPpeakAnno. Instead of building a new annotation data, the argument _bindingTypes_ and _bindingRegion_ in `annoPeak` function can be set to find the peaks within 5000 bp upstream and downstream of the TSS, which could be the user defined promoter region. ```{r, fig.height=3, fig.width=8} anno <- annotatePeakInBatch(peaks, AnnotationData=annoData, output="nearestBiDirectionalPromoters", bindingRegion=c(-5000, 500)) anno$symbol <- xget(anno$feature, org.Hs.egSYMBOL) anno[anno$peak=="peak12725"] ``` The annotated peaks can be visualized with R/Bioconductor package **trackViewer** developed by our group. ```{r trackViewer} library(trackViewer) gr <- peak <- peaks["peak12725"] start(gr) <- start(gr) - 5000 end(gr) <- end(gr) + 5000 if(.Platform$OS.type != "windows"){ peak12725 <- importScore(file=system.file("extdata", "Tead4.bigWig", package="ChIPpeakAnno"), ranges=peak, format = "BigWig") }else{## rtracklayer can not import bigWig files on Windows load(file.path(dirname(path), "cvglist.rds")) peak12725 <- Views(cvglists[["Tead4"]][[as.character(seqnames(peak))]], start(peak), end(peak)) peak12725 <- viewApply(peak12725, as.numeric) tmp <- rep(peak, width(peak)) width(tmp) <- 1 tmp <- shift(tmp, shift=0:(width(peak)-1)) mcols(tmp) <- peak12725 colnames(mcols(tmp)) <- "score" peak12725 <- new("track", dat=tmp, name="peak12725", type="data", format="BED") } trs <- geneModelFromTxdb(TxDb.Hsapiens.UCSC.hg19.knownGene, org.Hs.eg.db, gr) names(trs) <- paste(sapply(trs, function(.ele) .ele@name), names(trs), sep=":") optSty <- optimizeStyle(trackList(peak12725, trs, heightDist = c(.3, .7)), theme="bw") viewTracks(optSty$tracks, gr=gr, viewerStyle=optSty$style) ```