--- title: 'RNAmodR: AlkAnilineSeq' author: "Felix G.M. Ernst and Denis L.J. Lafontaine" date: "`r Sys.Date()`" abstract: > detection of m7G, m3C and D modification by AlkAnilineSeq output: BiocStyle::html_document: toc: true toc_float: true df_print: paged vignette: > %\VignetteIndexEntry{RNAmodR.AlkAnilineSeq} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown(css.files = c('custom.css')) ``` # Introduction 7-methyl guanosine (m7G), 3-methyl cytidine (m3C) and Dihydrouridine (D) are commonly found in rRNA and tRNA and can be detected classically by primer extension analysis. However, since the modifications do not interfere with Watson-Crick base pairing, a specific chemical treatment needs to be employed to cause strand breaks specifically at the modified positions. Initially, this involved a sodium borhydride treatment to create abasic sites and cleaving the RNA at abasic sites with aniline. This classical protocol was converted to a high throughput sequencing method call AlkAnilineSeq and allows modified position be detected by an accumulation of 5'-ends at the N+1 position [[@Marchand.2018]](#References). It was found, that m3C is susceptible to this treatment, which allows m7G, m3C and D to be detected by the same method from the same data sets, since the identify of the unmodified nucleotide informs about the three modified nucleotides. The `ModAlkAnilineSeq` class uses the the `NormEnd5SequenceData` class to store and aggregate data along the transcripts. The calculated scores follow the nomenclature of [[@Marchand.2018]](#References) with the names `scoreNC` (default) and `scoreSR`. ```{r, echo = FALSE} suppressPackageStartupMessages({ library(rtracklayer) library(GenomicRanges) library(RNAmodR.AlkAnilineSeq) library(RNAmodR.Data) }) ``` ```{r, eval = FALSE} library(rtracklayer) library(GenomicRanges) library(RNAmodR.AlkAnilineSeq) library(RNAmodR.Data) ``` # Example workflow The example workflow is limited to 18S rRNA and some tRNA from *S.cerevisiae*. As annotation data either a gff file or a `TxDb` object and for sequence data a fasta file or a `BSgenome` object can be used. The data is provided as bam files. ```{r, message=FALSE, results='hide'} annotation <- GFF3File(RNAmodR.Data.example.AAS.gff3()) sequences <- RNAmodR.Data.example.AAS.fasta() files <- list("wt" = c(treated = RNAmodR.Data.example.wt.1(), treated = RNAmodR.Data.example.wt.2(), treated = RNAmodR.Data.example.wt.3()), "Bud23del" = c(treated = RNAmodR.Data.example.bud23.1(), treated = RNAmodR.Data.example.bud23.2()), "Trm8del" = c(treated = RNAmodR.Data.example.trm8.1(), treated = RNAmodR.Data.example.trm8.2())) ``` The analysis is triggered by the construction of a `ModSetAlkAnilineSeq` object. Internally parallelization is used via the `BiocParallel` package, which would allow optimization depending on number/size of input files (number of samples, number of replicates, number of transcripts, etc). ```{r} msaas <- ModSetAlkAnilineSeq(files, annotation = annotation, sequences = sequences) msaas ``` As expected the m7G1575 is missing from the Bud23del samples. ```{r} mod <- modifications(msaas) lapply(mod,head, n = 2L) ``` # Visualizing the results As outlined in the `RNAmodR` package we can compare the samples using the `plotCompareByCoord` to prepare a heatmap. For this we select some position from the found modifications. In addition we prepare an alias table. ```{r} coord <- mod[[1L]] alias <- data.frame(tx_id = c(1L,3L,5L,6L,7L,8L,10L,11L), name = c("18S rRNA","tF(GAA)B","tG(GCC)B","tT(AGT)B", "tQ(TTG)B","tC(GCA)B","tS(CGA)C","tV(AAC)E1"), stringsAsFactors = FALSE) ``` ```{r plot1, fig.cap="Heatmap showing Stop ratio scores for detected m7G, m3C and D positions.", fig.asp=1} plotCompareByCoord(msaas, coord, score = "scoreSR", alias = alias, normalize = TRUE) ``` ```{r plot2, fig.cap="Heatmap showing Stop ratio scores for detected m7G1575 on the 18S rRNA.", fig.asp=0.5} plotCompareByCoord(msaas, coord[1L], score = "scoreSR", alias = alias) ``` In addition, the aggregate data along the transcript visualized as well. ```{r plot3, fig.cap="Stop ratio scores for detected m7G1575 on the 18S rRNA plotted as bar plots along the sequence.", fig.asp=1} plotData(msaas, "1", from = 1550L, to = 1600L) ``` This includes raw data as well. ```{r plot4, fig.cap="Stop ratio scores for detected m7G1575 on the 18S rRNA plotted as bar plots along the sequence. The raw sequence data is shown by setting `showSequenceData = TRUE.", fig.asp=1} plotData(msaas[1L:2L], "1", from = 1550L, to = 1600L, showSequenceData = TRUE) ``` # Session info ```{r, sessioninfo} sessionInfo() ``` # References