---
title: "Aligning reads with Rhisat2"
author: "Charlotte Soneson"
date: "`r Sys.Date()`"
package: Rhisat2
output: 
    BiocStyle::html_document
vignette: >
    %\VignetteIndexEntry{Rhisat2}
    %\VignetteEncoding{UTF-8}  
    %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: rhisat2.bib
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width=100)
```

The `r Biocpkg("Rhisat2")` package provides an interface to the
[HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) short-read aligner
software [@Kim2015-mu].

# Installation

Use the `BiocManager` package to download and install the package from
Bioconductor as follows:

```{r getPackage, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("Rhisat2")
```

If required, the latest development version of the package can also be installed
from GitHub.

```{r, eval = FALSE}
BiocManager::install("fmicompbio/Rhisat2")
```

Once the package is installed, load it into your R session:

```{r}
library(Rhisat2)
```

# Building a genome index

To build an index for the alignment, use the `hisat2_build` function. You need
to provide one or more fasta file with reference sequences, as well as an output
directory where the index will be stored, and a "prefix" (that will determine
the name of the index files in the output directory). Any additional arguments
to the `hisat2-build` binary can also be supplied to the function (see
`hisat2_build_usage()` or
[https://ccb.jhu.edu/software/hisat2/manual.shtml#the-hisat2-build-indexer](https://ccb.jhu.edu/software/hisat2/manual.shtml#the-hisat2-build-indexer)
for an overview of the available options).

The package contains three example reference sequences, corresponding to short
pieces of three different chromosomes. We show how to create an index (named
`myindex`) based on these sequences.

```{r}
list.files(system.file("extdata/refs", package="Rhisat2"), pattern="\\.fa$")
refs <- list.files(system.file("extdata/refs", package="Rhisat2"), 
                   full.names=TRUE, pattern="\\.fa$")
td <- tempdir()
hisat2_build(references=refs, outdir=td, prefix="myindex", 
             force=TRUE, strict=TRUE, execute=TRUE)
```

By instead setting `execute=FALSE` in the command above, `hisat2_build()` will
return the index building shell command for inspection, without executing it.

```{r}
print(hisat2_build(references=refs, outdir=td, prefix="myindex",
                   force=TRUE, strict=TRUE, execute=FALSE))
```

# Aligning reads to the genome index

After creating the index, reads can be aligned using the `hisat2` wrapper
function. Most commonly, the reads will be provided in `fastq` files (one file
for single-end reads, two files for paired-end reads). The names of these files
can be provided directly to the `hisat2` function, either as a vector (for
single-end reads) or as a list of length 2 (for paired-end reads, each list
element corresponds to one mate). You also need to provide the path to the index
generated by `hisat2_build` (the output directory combined with the prefix), and
the output file name where the alignments should be written.

```{r}
list.files(system.file("extdata/reads", package="Rhisat2"), 
           pattern="\\.fastq$")
reads <- list.files(system.file("extdata/reads", package="Rhisat2"),
                    pattern="\\.fastq$", full.names=TRUE)
hisat2(sequences=as.list(reads), index=file.path(td, "myindex"), 
       type="paired", outfile=file.path(td, "out.sam"), 
       force=TRUE, strict=TRUE, execute=TRUE)
```

As for `hisat2_build()`, any additional arguments to the `hisat2` binary can be
provided to the `hisat2()` function (see `hisat2_usage()` or
[https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2](https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2)
for an overview of the available options).

With HISAT2, you can provide a file with known splice sites at the alignment
step, which can help in finding the correct alignments across known splice
junctions. A text file in the required format can be generated using the
`extract_splice_sites()` function, starting from an annotation file in gtf or
gff3 format, or from a `GRanges` or `TxDb` object.

```{r}
spsfile <- tempfile()
gtf <- system.file("extdata/refs/genes.gtf", package="Rhisat2")
extract_splice_sites(features=gtf, outfile=spsfile)
hisat2(sequences=as.list(reads), index=file.path(td, "myindex"),
       type="paired", outfile=file.path(td, "out_sps.sam"),
       `known-splicesite-infile`=spsfile, 
       force=TRUE, strict=TRUE, execute=TRUE)
```


# Miscellaneous helper functions

To get the version of `HISAT2`:

```{r}
hisat2_version()
```

To see the usage of `hisat2_build()`:

```{r}
hisat2_build_usage()
```

And similarly to see the usage of `hisat2()`:

```{r}
hisat2_usage()
```

# Session info

```{r}
sessionInfo()
```

# References