%% LyX 2.2.1 created this file. For more info, see http://www.lyx.org/. %% Do not edit unless you really know what you are doing. %\VignetteIndexEntry{Guitar} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::knitr} %\VignetteKeywords{Guitar,visualization,sequencing,RNA methylation,m6A, m5C} %\VignettePackage{Guitar} \documentclass{article} \usepackage[sc]{mathpazo} \usepackage[T1]{fontenc} \usepackage{geometry} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm} \setcounter{secnumdepth}{2} \setcounter{tocdepth}{2} \usepackage{url} \usepackage[unicode=true,pdfusetitle, bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2, breaklinks=false,pdfborder={0 0 1},backref=false,colorlinks=false] {hyperref} \hypersetup{ pdfstartview={XYZ null null 1}} \usepackage{breakurl} \begin{document} <>= library(knitr) # set global chunk options opts_chunk$set(concordance=TRUE) @ \title{An Introduction to \Rpackage{Guitar} Package} \author{Xiao Du} \date{Modified: 26 April, 2019. Compiled: \today} \maketitle <>= options(width=60) @ \section{Quick Start with Guitar} This is a manual for Guitar package. The Guitar package is aimed for RNA landmark-guided transcriptomic analysis of RNA-reated genomic features. The Guitar package enables the comparison of multiple genomic features, which need to be stored in a name list. Please see the following example, which reads 1000 RNA m6A methylation sites into R for detection. Of course, in actual data analysis, features may come from multiple sets of resources. <>= library(Guitar) # genomic features imported into named list stBedFiles <- list(system.file("extdata", "m6A_mm10_exomePeak_1000peaks_bed12.bed", package="Guitar")) @ With the following script, we may generate the transcriptomic distribution of genomic features to be tested, and the result will be automatically saved into a PDF file under the working directory with prefix "example". With the \Rfunction{GuitarPlot} function, the gene annotation can be downloaded from internet automatically with a genome assembly number provided; however, this feature requires working internet and might take a longer time. The toy Guitar coordinates generated internally should never be re-used in other real data analysis. <>= count <- GuitarPlot(txGenomeVer = "mm10", stBedFiles = stBedFiles, miscOutFilePrefix = NA) @ In a more efficent protocol, in order to re-use the gene annotation and \Rclass{Guitar coordinates}, you will have to build Guitar Coordiantes from a \Rclass{txdb} object in a separate step. The transcriptDb contains the gene annotation information and can be obtained in a number of ways, .e.g, download the complete gene annotation of species from UCSC automatically, which might takes a few minutes. In the following analysis, we load the \Rclass{Txdb} object from a toy dataset provided with the Guitar package. Please note that this is only a very small part of the complete hg19 transcriptome, and the \Rclass{Txdb} object provided with \Rpackage{Guitar} package should not be used in real data analysis. With a \Rclass{TxDb} object that contains gene annotation information, we in the next build \Rclass{Guitar coordiantes}, which is essentially a bridge connects the transcriptomic landmarks and genomic coordinates. <>= txdb_file <- system.file("extdata", "mm10_toy.sqlite", package="Guitar") txdb <- loadDb(txdb_file) guitarTxdb <- makeGuitarTxdb(txdb = txdb, txPrimaryOnly = FALSE) # Or use gff. file to generate guitarTxdb # Or use getTxdb() to download TxDb from internet: # txdb <- getTxdb(txGenomeVer="hg19") # guitarTxdb <- makeGuitarTxdb(txdb) @ You may now generate the Guitar plot from the named list of genome-based features. <>= GuitarPlot(txTxdb = txdb, stBedFiles = stBedFiles, miscOutFilePrefix = "example") @ Alternatively, you may also optionally include the promoter DNA region and tail DNA region on the 5' and 3' side of a transcript in the plot with parameter {headOrtail =TRUE}. <>= GuitarPlot(txTxdb = txdb, stBedFiles = stBedFiles, headOrtail = TRUE) @ Alternatively, you may also optionally include the Confidence Interval for guitar plot with parameter {enableCI = FALSE}. <>= GuitarPlot(txTxdb = txdb, stBedFiles = stBedFiles, headOrtail = TRUE, enableCI = FALSE) @ \section{Supported Data Format} Besides BED file, {Guitar} package also supports GRangesList and GRanges data structures. Please see the following examples. <>= # import different data formats into a named list object. # These genomic features are using mm10 genome assembly stBedFiles <- list(system.file("extdata", "m6A_mm10_exomePeak_1000peaks_bed12.bed", package="Guitar"), system.file("extdata", "m6A_mm10_exomePeak_1000peaks_bed6.bed", package="Guitar")) # Build Guitar Coordinates txdb_file <- system.file("extdata", "mm10_toy.sqlite", package="Guitar") txdb <- loadDb(txdb_file) # Guitar Plot GuitarPlot(txTxdb = txdb, stBedFiles = stBedFiles, headOrtail = TRUE, enableCI = FALSE, mapFilterTranscript = TRUE, pltTxType = c("mrna"), stGroupName = c("BED12","BED6")) @ \section{Processing of sampling sites information} We can select parameters for site sampling. <>= stGRangeLists = vector("list", length(stBedFiles)) sitesPoints <- list() for (i in seq_len(length(stBedFiles))) { stGRangeLists[[i]] <- blocks(import(stBedFiles[[i]])) } for (i in seq_len(length(stGRangeLists))) { sitesPoints[[i]] <- samplePoints(stGRangeLists[i], stSampleNum = 10, stAmblguity = 5, pltTxType = c("mrna"), stSampleModle = "Equidistance", mapFilterTranscript = FALSE, guitarTxdb = guitarTxdb) } @ \section{Guitar Coordinates - Transcriptomic Landmarks Projected on Genome} The \Robject{guitarTxdb} object contains the genome-projected transcriptome coordinates, which can be valuable for evaluating transcriptomic information related applications, such as checking the quality of MeRIP-Seq data. The \Robject{Guitar coordinates} are essentially the genomic projection of standardized transcript-based coordiantes, making a viable bridge beween the landmarks on transcript and genome-based coordinates. It is based on the \Rclass{txdb} object input, extracts the transcript information in txdb, selects the transcripts that match the parameters according to the component parameters set by the user, and saves according to the transcript type (tx, mrna, ncrna). <>= guitarTxdb <- makeGuitarTxdb(txdb = txdb, txAmblguity = 5, txMrnaComponentProp = c(0.1,0.15,0.6,0.05,0.1), txLncrnaComponentProp = c(0.2,0.6,0.2), pltTxType = c("tx","mrna","ncrna"), txPrimaryOnly = FALSE) @ \section{Check the Overlapping between Different Components} We can also check the distribution of the Guitar coordinates built. <>= gcl <- list(guitarTxdb$tx$tx) GuitarPlot(txTxdb = txdb, stGRangeLists = gcl, stSampleNum = 200, enableCI = TRUE, pltTxType = c("tx"), txPrimaryOnly = FALSE ) @ Alternatively, we can extract the RNA components, check the distribution of tx components in the transcriptome <>= GuitarCoords <- guitarTxdb$tx$txComponentGRange type <- paste(mcols(GuitarCoords)$componentType,mcols(GuitarCoords)$txType) key <- unique(type) landmark <- list(1,2,3,4,5,6,7,8,9,10,11) names(landmark) <- key for (i in 1:length(key)) { landmark[[i]] <- GuitarCoords[type==key[i]] } GuitarPlot(txTxdb = txdb , stGRangeLists = landmark[1:3], pltTxType = c("tx"), enableCI = FALSE ) @ Check the distribution of mRNA components in the transcriptome <>= GuitarPlot(txTxdb = txdb , stGRangeLists = landmark[4:8], pltTxType = c("mrna"), enableCI = FALSE ) @ Check the distribution of lncRNA components in the transcriptome <>= GuitarPlot(txTxdb = txdb , stGRangeLists = landmark[9:11], pltTxType = c("ncrna"), enableCI = FALSE ) @ \section{Session Information} <>= sessionInfo() @ \end{document}