--- title: "Importing tRNAscan-SE output as GRanges" author: "Felix G.M. Ernst" date: "`r Sys.Date()`" package: tRNAscanImport abstract: > Example of importing a tRNAscan-SE output for sacCer3 as a GRanges object output: BiocStyle::html_document: toc: true toc_float: true df_print: paged vignette: > %\VignetteIndexEntry{tRNAscanImport} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown(css.files = c('custom.css')) ``` # Introduction tRNAscan-SE [@Lowe.1997] can be used for prediction of tRNA genes in whole genomes based on sequence context and calculated structural features. Many tRNA annotations in genomes, for example the current SGD reference genome sacCer3 of Saccharomyces cerevisiae, contain or are based on information generated by tRNAscan-SE. However, not all available information from tRNAscan-SE end up in the genome annotation. Among these are for example structural information, additional scores and the information, whether the conserved CCA-end is encoded in the genomic DNA. To work with this complete set of information, the tRNAscan-SE output can be parsed into a more accessible GRanges object using `tRNAscanImport`. # Getting started The default tRNAscan-SE output, either from running tRNAscan-SE [@Lowe.1997] locally or retrieving the output from the gtRNADb [@Chan.2016], consist of a formatted text document containing individual text blocks per tRNA delimited by an empty line. ```{r, echo=FALSE} suppressPackageStartupMessages({ library(tRNAscanImport) }) ``` ```{r} library(tRNAscanImport) sacCer3_file <- system.file("extdata", file = "sacCer3-tRNAs.ss.sort", package = "tRNAscanImport") # output for sacCer3 # Before readLines(con = sacCer3_file, n = 7L) ``` # Importing as GRanges To access the information in a BioC context the import as a GRanges object comes to mind. `import.tRNAscanAsGRanges()` performs this task by evaluating each text block using regular expressions. ```{r} # output for sacCer3 # After gr <- import.tRNAscanAsGRanges(sacCer3_file) head(gr, 2) ``` The result can be used directly in R or saved as gff3/fasta file for further use, including processing the sequences for HTS read mapping or statistical analysis on tRNA content of the analyzed genome. ```{r, echo=FALSE} suppressPackageStartupMessages({ library(Biostrings) library(rtracklayer) }) ``` ```{r} library(Biostrings) library(rtracklayer) # suppressMessages(library(rtracklayer, quietly = TRUE)) # Save tRNA sequences writeXStringSet(gr$tRNA_seq, filepath = tempfile()) # to be GFF3 compliant use tRNAscan2GFF gff <- tRNAscan2GFF(gr) export.gff3(gff, con = tempfile()) ``` # Visualization The tRNAscan-SE information can be visualized using the `gettRNAscanPlots()` function, returning a named list of ggplot2 plots, which can be plotted or further modified. `gettRNAscanPlots()` requires ggplot2 to be installed. Alternatively, `gettRNAscanSummary()` returns the aggregated information for further use. `plottRNAscan()` plots the output of `gettRNAscanPlots()` directly to the output. ```{r, echo=FALSE} suppressPackageStartupMessages({ library(GenomicRanges) }) ``` ```{r} library(GenomicRanges) # tRNAscan-SE output for hg38 hg38_file <- system.file("extdata", file = "hg38-tRNAs.ss.sort", package = "tRNAscanImport") # tRNAscan-SE output for E. coli MG1655 eco_file <- system.file("extdata", file = "eschColi_K_12_MG1655-tRNAs.ss.sort", package = "tRNAscanImport") # import tRNAscan-SE files gr_hg <- import.tRNAscanAsGRanges(hg38_file) gr_eco <- import.tRNAscanAsGRanges(eco_file) # get summary plots if ggplot2 is installed grl <- GRangesList(Sce = gr, Hsa = gr_hg, Eco = gr_eco) plots <- gettRNAscanPlots(grl) ``` ```{r, fig.cap = "tRNA length."} plots$length ``` ```{r, fig.cap = "tRNAscan-SE scores."} plots$tRNAscan_score ``` ```{r, fig.cap = "tRNA GC content."} plots$gc ``` ```{r, fig.cap = "tRNAs with introns."} plots$introns ``` # Session info ```{r} sessionInfo() ``` # References