%\VignetteIndexEntry{Using PING with paired-end sequencing data} %\VignetteDepends{PING,parallel} %\VignetteKeywords{Preprocessing, ChIP-Seq, Sequencing} %\VignettePackage{PING} \documentclass[11pt]{article} \usepackage{Sweave} \usepackage{underscore} \usepackage{hyperref} %\usepackage{url} %\usepackage{color, pdfcolmk} %\usepackage[authoryear,round]{natbib} %\bibliographystyle{plainnat} %\usepackage[hmargin=2cm, vmargin=3cm]{geometry} \SweaveOpts{keep.source=FALSE} %Introduce newlines automatically in R code %\newcommand{\scscst}{\scriptscriptstyle} %\newcommand{\scst}{\scriptstyle} \title{Using PING with Paired-End sequencing data} \author{Xuekui Zhang\footnote{ubcxzhang@gmail.com}, Sangsoon Woo\footnote{swoo@fhcrc.org}, Raphael Gottardo\footnote{rgottard@fhcrc.org} and Renan Sauteraud\footnote{rsautera@fhcrc.org}} \begin{document} <>= options(continue=" ") @ \maketitle \textnormal {\normalfont} This vignette presents a workflow to use PING for analyzing paired-end sequencing data. \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newpage \section{Licensing and citing} Under the Artistic License 2.0, you are free to use and redistribute this software. If you use this package for a publication, we would ask you to cite the following: \begin{itemize} \item[] Xuekui Zhang, Gordon Robertson, Sangsoon Woo, Brad G. Hoffman, and Raphael Gottardo. (2012). Probabilistic Inference for Nucleosome Positioning with MNase-based or Sonicated Short-read Data. PLoS ONE 7(2): e32095. \end{itemize} \section{Introduction} For an introduction to the biological background and \texttt{PING} method, please refer to the other vignette: `The \texttt{PING} user guide'. Because the structure of paired-end sequencing data requires a slightly different treatment , we are separately presenting how to use \texttt{PING} for these data in this vignette. \section{PING analysis steps} A typical PING analysis consists of the following steps: \begin{enumerate} \item Extract reads and chromosomes from bam files into a GRanges object. \item Segment the genome into candidate regions that have sufficient aligned reads via `segmentPING' \item Estimate nucleosome positions and other parameters with PING \item Post-process PING predictions to correct certain predictions \end{enumerate} As with any R package, you should first load it with the following command: <>= library(PING) @ \section{Data Input and Formatting} As with the Single-End \texttt{PING}, the input used for the segmentation step is a \texttt{GRanges} object. %Because Paired-End sequencing data often comes in the form of BAM files, we provide a function called \texttt{bam2gr} to convert these files into \texttt{GRanges} objects with all the appropriate information. %A small BAM file including a region of yeast's chromosome I is provided to be used as an example in this vignette. Because sequencing data often comes in the form of BAM files, in the \texttt{PICS} package, we provide a function called \texttt{bam2gr} to convert these files into \texttt{GRanges} objects with all the appropriate information. A small BAM file including a region of yeast's chromosome I is provided to be used as an example in this vignette. <>= yeastBam<- system.file("extdata/yeastChrI.bam",package="PING") @ <>= library(PICS) gr<-bam2gr(bamFile=yeastBam, PE=TRUE) @ $gr$ is a \texttt{GRanges} object containing all the reads from the .bam file. Note that this function will also work for single-end sequencing data and the argument \texttt{PE} should be set to TRUE when dealing with paired-end data. \section{PING analysis} \subsection{Genome segmentation} PING is used the same way for paired-end and single-end sequencing data. The function \texttt{segmentPING} will decide which segmentation method should be used based on the arguments provided. When dealing with paired-end data, four new arguments have to be passed to the function: \texttt{islandDepth}, \texttt{min_cut} and \texttt{max_cut} for candidate region selection. These arguments control the size and required coverage for a region to be considered as a candidate. In order to run \texttt{segmentPING}, we have to subset our GRanges object to have a single chromosome <>= grI<-gr[seqnames(gr)=="chrI"] seqlevels(grI)<-"chrI" @ <>= segPE<-segmentPING(grI, PE=TRUE) @ It returns a \texttt{segReadsListPE} object. \subsection{Parameter estimation} Parallelisation will also work with paired-end data. In what follows, we assume that \texttt{parallel} is installed on your machine. If it is not, the first line should be omitted and calculations will occur on a single CPU. <>= library(parallel) @ <>= ping<-PING(segPE, nCores=2) @ The returned object is a \texttt{pingList}, which will go through a post-processing step using \texttt{postPING} function. \section{Post-processing PING results} %{sigmaB2=3600; rho2=15; alpha2=98; beta2=200000} <>= PS=postPING(ping, segPE) @ The result output of \texttt{postPING} is a dataframe that contains estimated parameters of each nucleosome. \section{Analyzing the prediction} \texttt{PING} comes with a set of tools to export or visualize the prediction. Here, we only show how to export the results into bed format for further analysis and how to make a quick plot to summarize the nucleosome prediction. For more information on how to export the results or make more complex figures, please refer to the section `Result output' of PING vignette. The function \texttt{makeRangedDataOutput} offers a simple way to convert the prediction results into a \texttt{RangedData} objec that can be exported into a file using the \texttt{rtracklayer} package. <>= rdBed<-makeRangedDataOutput(PS, type="bed") library(rtracklayer) export(rdBed, "nucPrediction.bed") @ The exported file includes all information about the predicted nucleosomes, which are already automatically ranked by their score. \vspace{10pt} For paired-end sequencing data, the bult-in plotting function \texttt{plotSummary} can be used to visualize the predicted nucleosome positions obtained from \texttt{postPING} function. <>= plotSummary(PS, ping, grI, chr="chrI", from=149000, to=153000) @ %Note that the argument PE should be set to TRUE. All the arguments for this function will work for Paired-end data as well. Refer to PING vignette and the man page ?plotSummary for more information. \end{document}