%\VignetteIndexEntry{riboSeqR} %\VignettePackage{riboSeqR} \documentclass[a4paper]{article} \usepackage{subfig} \title{riboSeqR} \author{Thomas J. Hardcastle, Betty Y.W. Chung} <<>= BiocStyle::latex() @ \begin{document} \maketitle \section*{Introduction} Ribosome profiling extracts those parts of a coding sequence currently bound by a ribosome (and thus, are likely to be undergoing translation). Ribosomes typically cover between 20-30 bases of the mRNA (dependant on conformational changes) and move along the mRNA three bases at a time. Sequenced reads of a given length are thus likely to lie predominantly in a single frame relative to the start codon of the coding sequence. This package presents a set of methods for parsing ribosomal profiling data from multiple samples and aligned to coding sequences, inferring frameshifts, and plotting the average and transcript-specific behaviour of these data. Methods are also provided for extracting the data in a suitable form for differential translation analysis. For a fuller description of these methods and further examples of their use, see Chung \& Hardcastle \textsl{et al} (2015) \cite{chung}. \section*{Getting Data} \verb'riboSeqR' currently reads alignment data from BAM files, or from flat text files, although BAM format is recommended. \section*{Workflow Example} Begin by loading the riboSeqR library. <<>>= library(riboSeqR) @ Identify the data directory for the example data. <<>>= datadir <- system.file("extdata", package = "riboSeqR") @ The \Rfunction{fastaCDS} function can be used to guess at potential coding sequences from a (possibly compressed; see \Rfunction{base::file}) fasta file containing mRNA transcripts (note; do not use this on a genome!). These can also be loaded into a \Rclass{GRanges} object from an annotation file. <<>>= chlamyFasta <- paste(datadir, "/rsem_chlamy236_deNovo.transcripts.fa", sep = "") fastaCDS <- findCDS(fastaFile = chlamyFasta, startCodon = c("ATG"), stopCodon = c("TAG", "TAA", "TGA")) @ The ribosomal and RNA (if available) alignment files are specified. <<>>= ribofiles <- paste(datadir, "/chlamy236_plus_deNovo_plusOnly_Index", c(17,3,5,7), sep = "") rnafiles <- paste(datadir, "/chlamy236_plus_deNovo_plusOnly_Index", c(10,12,14,16), sep = "") @ The aligned ribosomal (and RNA) data can be read in using the \Rfunction{readRibodata} function. The columns can be specified as a parameter of the \Rfunction{readRibodata} function if the data in the alignment files are differently arranged. <<>>= riboDat <- readRibodata(ribofiles, replicates = c("WT", "WT", "M", "M")) @ The alignments can be assigned to frames relative to the coding coordinates with the \Rfunction{frameCounting} function. <<>>= fCs <- frameCounting(riboDat, fastaCDS) @ The predominant reading frame, relative to coding start, can be estimated from the frame calling (or from a set of coordinates and alignment data) for each n-mer. The weighting decribes the proportion of n-mers fitting with the most likely frameshift. The reading frame can also be readily visualised using the \Rfunction{plotFS} function. <<<>= fS <- readingFrame(rC = fCs); fS plotFS(fS) @ \begin{figure}[!ht] \begin{center} <>= <> @ \caption{Number of n-mers in each frame relative to coding start. 27-mers are predominantly in frame-1, while 28-mers are chiefly in frame-0.} \label{fig:frameshift} \end{center} \end{figure} These can be filtered on the mean number of hits and unique hits within replicate groups to give plausible candidates for coding. Filtering can be limited to given lengths and frames, which may be inferred from the output of the \Rfunction{readingFrame} function. <<>>= ffCs <- filterHits(fCs, lengths = c(27, 28), frames = list(1, 0), hitMean = 50, unqhitMean = 10, fS = fS) @ We can plot the total alignment at the 5' and 3' ends of coding sequences using the \Rfunction{plotCDS} function. The frames are colour coded; frame-0 is red, frame-1 is green, frame-2 is blue. <<<>= plotCDS(coordinates = ffCs@CDS, riboDat = riboDat, lengths = 27) @ Note the frameshift for 28-mers. <<<>= plotCDS(coordinates = ffCs@CDS, riboDat = riboDat, lengths = 28) @ \begin{figure}[!ht] \centering <>= <> @ \label{fig:cds27} \caption{Average alignment of 27-mers to 5' and 3' ends of coding sequences.} \end{figure} \begin{figure}[!ht] \centering <>= <> @ \label{fig:cds28} \caption{Average alignment of 28-mers to 5' and 3' ends of coding sequences.} \end{figure} We can plot the alignment over an individual transcript sequence using the \Rfunction{plotTranscript} function. Observe that one CDS (on the right) contains the 27s in the same phase as the CDS (they are both red) while the putative CDSes to the left are not in phase with the aligned reads, suggesting either a sequence error in the transcript or a misalignment. The coverage of RNA sequenced reads is shown as a black curve (axis on the right). <<<>= plotTranscript("CUFF.37930.1", coordinates = ffCs@CDS, riboData = riboDat, length = 27, cap = 200) @ \begin{figure}[!ht] \begin{center} <>= <> @ \caption{Alignment to individual transcript.} \label{fig:trans} \end{center} \end{figure} We can extract the counts from a \Rclass{riboCoding} object using the \Rfunction{sliceCounts} function <<>>= riboCounts <- sliceCounts(ffCs, lengths = c(27, 28), frames = list(0, 2)) @ Counts for RNA-sequencing can be extracted using from the riboData object and the coding coordinates using the \Rfunction{rnaCounts} function. This is a relatively crude counting function, and alternatives have been widely described in the literature on mRNA-Seq. <<>>= rnaCounts <- rnaCounts(riboDat, ffCs@CDS) @ These data may be used in an analysis of differential translation through comparison with the RNA-seq data. See the description of a beta-binomial analysis in the \Biocpkg{baySeq} vignettes for further details. <<>>= library(baySeq) pD <- new("countData", replicates = ffCs@replicates, data = list(riboCounts, rnaCounts), groups = list(NDT = c(1,1,1,1), DT = c("WT", "WT", "M", "M")), annotation = as.data.frame(ffCs@CDS), densityFunction = bbDensity) libsizes(pD) <- getLibsizes(pD) pD <- getPriors(pD, cl = NULL) pD <- getLikelihoods(pD, cl = NULL) topCounts(pD, "DT", normaliseData = TRUE) @ \section*{Session Info} <<>>= sessionInfo() @ \begin{thebibliography}{99} \bibitem{chung} BY Chung and TJ Hardcastle and JD Jones and N Irigoyen and AE Firth and DC Baulcombe and I Brierley \textsl{The use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for Ribo-seq data analysis.} RNA (2015). \end{thebibliography} \end{document}