%\VignetteIndexEntry{An introduction to exomePeak} %\VignetteDepends{} %\VignetteKeywords{peak detection, differential methylation} %\VignettePackage{exomePeak} \documentclass[]{article} \usepackage{times} \usepackage{hyperref} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \newcommand{\exomePeak}{\Rpackage{exomePeak}} \newcommand{\bam}{\texttt{BAM}} \title{An Introduction to \Rpackage{exomePeak}} \author{Jia Meng, PhD} \date{Modified: 18 August, 2013. Compiled: \today} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle <>= options(width=60) @ \section{Introduction} The \exomePeak{} R-package has been developed based on the MATLAB ``exomePeak" package, for the analysis of RNA epitranscriptome sequencing data with affinity-based shotgun sequencing approach, such as MeRIP-Seq or m6A-Seq. \textbf{The exomePeak package is under active development, please don't hesitate to contact me @ jia.meng@hotmail if you have any questions.} The inputs of the main function ``exomepeak" are the IP BAM files and input control BAM files: \begin{itemize} \item From one experiment condition: for peak calling to identify the RNA methylation sites \item From two experimental conditions: for peak calling and differential analysis to unveil the post-transcriptional regulation of RNA modifications. \end{itemize} Gene annotation can be provided as a GTF file, a transcriptDb object, or automatically downloaded from UCSC through the internet. Let us firstly load the package and get the toy data (came with the package) ready. <>= library("exomePeak") gtf=system.file("extdata", "example.gtf", package="exomePeak") f1=system.file("extdata", "IP1.bam", package="exomePeak") f2=system.file("extdata", "IP2.bam", package="exomePeak") f3=system.file("extdata", "IP3.bam", package="exomePeak") f4=system.file("extdata", "IP4.bam", package="exomePeak") f5=system.file("extdata", "Input1.bam", package="exomePeak") f6=system.file("extdata", "Input2.bam", package="exomePeak") f7=system.file("extdata", "Input3.bam", package="exomePeak") f8=system.file("extdata", "treated_IP1.bam", package="exomePeak") f9=system.file("extdata", "treated_Input1.bam", package="exomePeak") @ We will in the next see how the two main functions can be accomplished in a single command. The first main function of ``exomePeak" R-package is to call peaks (enriched binding sites) to detect RNA methylation sites on the exome. Inputs are the gene annotation GTF file, IP and Input control samples in BAM format. This function is used when data from only one condition is available. <>= result = exomepeak(GENE_ANNO_GTF=gtf, IP_BAM=c(f1,f2,f3,f4), INPUT_BAM=c(f5,f6,f7)) names(result) @ The results will be saved in the specified output directory, including the identified (consistent) peaks in BED/table format. The BED format can be visualized in genome browser directly and the peaks may span one or multiple introns. The function also returns two GRangesList objects, in which there are called peaks and consistent peaks. The consistent peaks in the latter appear on all the IP replicates compared with the merged Input control sample, and is thus recommended. The log p-value, log fdr and fold enrichment of the identified peaks are stored as metadata, which can be extracted with command \emph{mcols}. <>= recommended_peaks = result$con_peaks # consistent peaks (Recommended!) peaks_info = mcols(recommended_peaks) # information of the consistent peaks head(peaks_info) @ or to get all the peak detected (some of them do not consistently appear on all replicates.): <>= all_peaks = result$all_peaks # get all peaks peaks_info = mcols(all_peaks) # information of all peaks head(peaks_info) @ When there are MeRIP-Seq data available from two experimental conditions, the ``exomepeak" function may can unveil the dynamics in post-transcriptional regulation of the RNA methylome. In the following example, the function will report the sites that are post-transcriptional differentially methylated between the two tested conditions (TREATED vs. UNTREATED). <>= result = exomepeak(GENE_ANNO_GTF=gtf, IP_BAM=c(f1,f2,f3,f4), INPUT_BAM=c(f5,f6,f7), TREATED_IP_BAM=c(f8), TREATED_INPUT_BAM=c(f9)) @ The algorithm will firstly identify reads enriched binding sites or peaks, and then check whether the sites are differentially methylated between the two experimental conditions. The results will be saved in the specified output directory, including the identified (consistent) peaks in BED and table formats, along with the differential information indicating whether the site is hyper- or hypo-methylated under the treated condition. Similar to the peak calling case, the BED format can be visualized in genome browser directly and the peaks may span one or multiple introns. Similar to the peak calling case, the function will report a set of consistent differentially methylated peaks saved in the specified folder, which is the recommended set. The function also returns 3 GRangesList object, containing all the peaks, the differentially methylated peaks with the given threshold on the merged data, consistently differentially methylated peaks. The consistent differentially methylated peaks in the last appear to be differential for all the replicates and is thus recommended. The information of the identified peaks and the differential analysis are stored as metadata, which can be extracted. <>= names(result) is.na(result$con_sig_diff_peaks) # no reported consistent differnetial peaks @ Unfortunately, there is no reported consistent differnetial peaks on the toy data, to get the information of all the peaks and the differential analysis information: <>= diff_peaks = result$diff_peaks # consistent differential peaks (Recommended!) peaks_info = mcols(diff_peaks) # information of the consistent peaks head(peaks_info[,1:3]) # peak calling information head(peaks_info[,4:6]) # differential analysis information @ Gene annotation may be alternatively downloaded directly from internet, but will take a really long time due to the downloading time and huge transcriptome needed to be scanned. \end{document}