% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{rnaSeqMap primer} %\VignetteKeywords{} %\VignetteDepends{} %\VignettePackage{rnaSeqMap} %documentclass[12pt, a4paper]{article} \documentclass[12pt]{article} \usepackage[utf8]{inputenc} \usepackage{amsmath,pstricks} %\usepackage{auto-pst-pdf} \usepackage{hyperref} \usepackage[numbers]{natbib} \usepackage{color} \usepackage[scaled]{helvet} \usepackage{auto-pst-pdf} \renewcommand*\familydefault{\sfdefault} \definecolor{NoteGrey}{rgb}{0.96,0.97,0.98} \textwidth=6.2in \textheight=9.5in @ \parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-1in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\code}[1]{{\texttt{#1}}} \author{Anna Lesniewska, Michal Okoniewski} \begin{document} \title{\code{rnaSeqMap}: RNASeq analyses using xmap database back-end} \maketitle \fcolorbox{black}{NoteGrey} { \begin{minipage}{13.5cm} \begin{center} \textbf{ Vignette for v.2.11.1 - no Xmap database needed} \end{center} \end{minipage} } \tableofcontents \newpage \section{Recent changes and updates} 8.09.2012 - the vignette contains mainly the material covered by the ECCB 2012 tutorial chunk 28.06.2012 - switched to GAlignments in RS class and used them to get coverage in ND - corrected all the rle into Rle 04.10.2011 - added data modification generators : generatorAddSquare(), generatorAdd(),generatorMultiply(), generatorTrunc(), generatorPeak(), generatorSynth() local coverage normalizations: standarizationNormalize(), densityNormalize(), min\_maxNormalize() local coverage difference measures: ks\_test(), diff\_area(), diff\_derivative\_area(),qq\_plot(), qq\_derivative\_plot(), pp\_plot(), pp\_derivative\_plot(), hump\_diff1(), hump\_diff2() 14.05.2011 - added parseGff3() \section{Introduction} \code{rnaSeqMap} is a "middleware" library for RNAseq secondary analyses. It constitutes an API for such operations as: \begin{itemize} \item{access to any the reads of the experiment in possibly fastest time, according to any chromosome coordinates} \item{accessing sets of reads according to genomic annotation in Ensembl} \item{calculation of coverage and number of reads and transformations of those values} \item{creating input for significance analysis algorithms - from edgeR and DESeq} \item{precisely finding significant and consistent regions of expression} \item{splicing analyses} \item{visualizations of genes and expression regions} \end{itemize} The library is independent from the sequencing technology and reads mappina software. It needs either reads described as genome coordinates in the extended xmap database, or can alternatively read data as big as they can fit in the operational memory. The use with modified xmap database is recommended, as it overcomes memory limitations - thus the library can be efficiently run on not very powerful machines. The internal features of \code{rnaSeqMap} distinctive for this piece of software are: \begin{itemize} \item{sequencing reads and annotations in one common database - extended XMAP \cite{Yates07}} \item{algorithm for finding irreducible regions of genomic expression - according to Aumann and Lindell \cite{Lindell03}} \item{nucleotide-level splicing analysis} \item{connectors for further gene- and region-level expression processing to DESeq \cite{Anders10} and edgeR \cite{Robinson09}} \item{the routines for coverage, splicing index and region mining algorithm have been implemented in C for speed} \end{itemize} %} \section{Using the SeqReadsand NucleotideDistr objects} The reads are provided into the objects built according to genome coordinates from BAM files described in the "covdesc" file. <>= rs <- newSeqReads(ch,st, en, str); rs <- getBamData(rs,idx.both, cvd=cvd) nd <- getCoverageFromRS(rs, idx.both) @ \section{Processing schema to get the coverage measures using the "camel wrapper"} To get the coverage difference measures described in \cite{Okoniewski2012} Encode the experimental design in the sample description/covdesc file The comparison may be done between any two group of samples (1+1, n+n, n+m) Get samples indices, eg: <>= idxT <- which(samples$condition=="T") idxC <- which(samples$condition=="C") @ Prepare the table of genome coordinates to query Encode them as GenomicRanges object, eg: <>= regions.gR <- rnaSeqMap:::.fiveCol2GRanges(tmp) @ Run the wrapper for all camel comparisons <>= regionsCamelMeasures <- gRanges2CamelMeasures(regions.gR,samples,idxT,idxC,sums=sums,progress=10) @ Run detection filtering by the density of coverage, eg: <>= idx <- which(regionsCamelMeasures[,"covDensC1"]>10 | regionsCamelMeasures[,"covDensC1"]>10) regionsCamelMeasures <- regionsCamelMeasures[idx, ] @ Order the regions by a selected measure: <>= o <- order(regionsCamelMeasures[,"QQ.mm"], decreasing=T) regionsCamelMeasures <- regionsCamelMeasures [o, ] @ \section{Using Aumann-Lindell two-sliding-window algorithm to find expressed genomic regions} The regions will be found as new object containing mindiff (second parameter value) for the nucleotides for which there are irreducible regions of coverage with given mindiff and minimal length - minsup. For the details of the algorithm see \cite{Lesniewska2011, Aumann2003} <>= nd.AL <- findRegionsAsND(nd, 15, minsup=5) @ \begin{thebibliography}{99} \bibitem{Lesniewska2011} Leśniewska, A., Okoniewski, M. J. (2011). rnaSeqMap: a Bioconductor package for RNA sequencing data exploration. BMC bioinformatics, 12, 200. doi:10.1186/1471-2105-12-200 \bibitem{Okoniewski2012} Okoniewski, M. J., Leśniewska, A., Szabelska, A., Zyprych-Walczak, J., Ryan, M., Wachtel, M., Morzy, T., et al. (2011). Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage. Nucleic acids research. doi:10.1093/nar/gkr1249 \bibitem{Aumann2003} Aumann, Y., Lindell, Y. (2003). A Statistical Theory for Quantitative Association Rules. J. Intell. Inf. Syst., 20(3), 255–283. \end{thebibliography} \end{document}