---
title: "Importing tRNAscan-SE output as `GRanges`"
author: "Felix G.M. Ernst"
date: "`r Sys.Date()`"
package: tRNAscanImport
abstract: >
  Example of importing a tRNAscan-SE output for sacCer3 as a GRanges object
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    df_print: paged
vignette: >
  %\VignetteIndexEntry{tRNAscanImport}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown(css.files = c('custom.css'))
```

# Introduction

tRNAscan-SE [@Lowe.1997] can be used for prediction of tRNA genes in whole
genomes based on sequence context and calculated structural features. Many tRNA
annotations in genomes contain or are based on information generated by
tRNAscan-SE, for example the current SGD reference genome sacCer3 for
Saccharomyces cerevisiae. However, not all available information from
tRNAscan-SE end up in the genome annotation. Among these are for example
structural information, additional scores and the information, whether the
conserved CCA-end is encoded in the genomic DNA. To work with this complete set
of information, the tRNAscan-SE output can be parsed into a more accessible
GRanges object using `tRNAscanImport`.

# Getting started

The default tRNAscan-SE output, either from running tRNAscan-SE [@Lowe.1997]
locally or retrieving the output from the gtRNADb [@Chan.2016], consist of a
formatted text document containing individual text blocks per tRNA delimited by
an empty line.
```{r, echo=FALSE}
suppressPackageStartupMessages({
  library(tRNAscanImport)
})
```
```{r}
library(tRNAscanImport)
yeast_file <- system.file("extdata",
                          file = "yeast.tRNAscan",
                          package = "tRNAscanImport")

# output for sacCer3
# Before
readLines(con = yeast_file, n = 7L)
```

# Importing as `GRanges`

To access the information in a BioC context the import as a GRanges object
comes to mind. `import.tRNAscanAsGRanges()` performs this task by evaluating
each text block using regular expressions.

```{r}
# output for sacCer3
# After
gr <- import.tRNAscanAsGRanges(yeast_file)
head(gr, 2)
# Any GRanges passing this, can be used for subsequent function
istRNAscanGRanges(gr)
```

The result can be used directly in R or saved as gff3/fasta file for further
use, including processing the sequences for HTS read mapping or statistical
analysis on tRNA content of the analyzed genome.
```{r, echo=FALSE}
suppressPackageStartupMessages({
  library(Biostrings)
  library(rtracklayer)
})
```
```{r}
library(Biostrings)
library(rtracklayer)
# suppressMessages(library(rtracklayer, quietly = TRUE))
# Save tRNA sequences
writeXStringSet(gr$tRNA_seq, filepath = tempfile())
# to be GFF3 compliant use tRNAscan2GFF
gff <- tRNAscan2GFF(gr)
export.gff3(gff, con = tempfile())
```

# Visualization

The tRNAscan-SE information can be visualized using the `gettRNAFeaturePlots()`
function of the `tRNA` package, returning a named list of ggplot2 plots, which 
can be plotted or further modified.
Alternatively, `gettRNASummary()` returns the aggregated information for
further use.

```{r}
# tRNAscan-SE output for hg38
human_file <- system.file("extdata",
                          file = "human.tRNAscan",
                          package = "tRNAscanImport")
# tRNAscan-SE output for E. coli MG1655
eco_file <- system.file("extdata",
                        file = "ecoli.tRNAscan",
                        package = "tRNAscanImport")
# import tRNAscan-SE files
gr_human <- import.tRNAscanAsGRanges(human_file)
gr_eco <- import.tRNAscanAsGRanges(eco_file)

# get summary plots
grl <- GRangesList(Sce = gr,
                   Hsa = gr_human,
                   Eco = gr_eco)
plots <- gettRNAFeaturePlots(grl)
```
```{r plot1, fig.cap = "tRNA length."}
plots$length
```
```{r plot2, fig.cap = "tRNAscan-SE scores."}
plots$tRNAscan_score
```
```{r plot3, fig.cap = "tRNA GC content."}
plots$gc
```
```{r plot4, fig.cap = "tRNAs with introns."}
plots$tRNAscan_intron
```
```{r plot5, fig.cap = "Length of the variable loop."}
plots$variableLoop_length
```

# Getting tRNA precursor sequences

Since tRNAscan reports the genomic location for tRNAs found, approximate tRNA
precursor sequences can be retrieved by combining a tRNAscan input object with 
matching genomic sequences for the function `get.tRNAprecursor`.

```{r, echo=FALSE}
suppressPackageStartupMessages({
  library(BSgenome.Scerevisiae.UCSC.sacCer3)
})
```
```{r tRNA_precursor}
library(BSgenome.Scerevisiae.UCSC.sacCer3)
genome <- getSeq(BSgenome.Scerevisiae.UCSC.sacCer3)
# renaming chromosome to match tRNAscan output
names(genome) <- c(names(genome)[-17L],"chrmt")
tRNAprecursor <- get.tRNAprecursor(gr, genome)
head(tRNAprecursor)
```

The length of the overhangs can be defined with the arguments `add.5prime` and 
`add.3prime`, respectively. Both support individual lengths for each tRNA and 
require values to be integer only. In addition, introns can be removed by 
setting `trim.introns = TRUE`. 

# Further reading

Further examples of working with tRNA information can be found in the 
[vignette](tRNA.html) of the `tRNA` package.

# Session info

```{r}
sessionInfo()
```

# References