--- title: "crisprBowtie: alignment of gRNA spacer sequences using bowtie" author: - name: Jean-Philippe Fortin affiliation: Department of Data Science and Statistical Computing, gRED, Genentech email: fortin946@gmail.com date: "`r Sys.Date()`" output: BiocStyle::html_document: toc_float: true # theme: paper number_sections: true vignette: > %\VignetteIndexEntry{Introduction to crisprBowtie} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} bibliography: references.bib --- # Overview of crisprBowtie `crisprBowtie` provides two main functions to align short DNA sequences to a reference genome using the short read aligner bowtie [@langmead2009bowtie] and return the alignments as R objects: `runBowtie` and `runCrisprBowtie`. It utilizes the Bioconductor package `Rbowtie` to access the Bowtie program in a platform-independent manner. This means that users do not need to install Bowtie prior to using `crisprBowtie`. The latter function (`runCrisprBowtie`) is specifically designed to map and annotate CRISPR guide RNA (gRNA) spacer sequences using CRISPR nuclease objects and CRISPR genomic arithmetics defined in the Bioconductor package [crisprBase](https://github.com/crisprVerse/crisprBase). This enables a fast and accurate on-target and off-target search of gRNA spacer sequences for virtually any type of CRISPR nucleases. It also provides an off-target search engine for our main gRNA design package [crisprDesign](https://github.com/crisprVerse/crisprDesign) of the [crisprVerse](https://github.com/crisprVerse) ecosystem. See the `addSpacerAlignments` function in `crisprDesign` for more details. # Installation and getting started ## Software requirements ### OS Requirements This package is supported for macOS, Linux and Windows machines. Package was developed and tested on R version 4.2. ## Installation from Bioconductor `crisprBowtie` can be installed from from the Bioconductor devel branch using the following commands in a fresh R session: ```{r, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version="devel") BiocManager::install("crisprBowtie") ``` # Building a bowtie index To use `runBowtie` or `runCrisprBowtie`, users need to first build a Bowtie genome index. For a given genome, this step has to be done only once. The `Rbowtie` package conveniently provides the function `bowtie_build` to build a Bowtie index from any custom genome from a FASTA file. As an example, we build a Bowtie index for a small portion of the human chromosome 1 (`chr1.fa` file provided in the `crisprBowtie` package) and save the index file as `myIndex` to a temporary directory: ```{r} library(Rbowtie) fasta <- file.path(find.package("crisprBowtie"), "example/chr1.fa") tempDir <- tempdir() Rbowtie::bowtie_build(fasta, outdir=tempDir, force=TRUE, prefix="myIndex") ``` To learn how to create a Bowtie index for a complete genome or transcriptome, please visit our [tutorial page](https://github.com/crisprVerse/Tutorials/tree/master/Building_Genome_Indices). # Alignment using `runCrisprBowtie` As an example, we align 6 spacer sequences (of length 20bp) to the custom genome built above, allowing a maximum of 3 mismatches between the spacer and protospacer sequences. We specify that the search is for the wildtype Cas9 (SpCas9) nuclease by providing the `CrisprNuclease` object `SpCas9` available through the `crisprBase` package. The argument `canonical=FALSE` specifies that non-canonical PAM sequences are also considered (NAG and NGA for SpCas9). The function `getAvailableCrisprNucleases` in `crisprBase` returns a character vector of available `crisprNuclease` objects found in `crisprBase`. ```{r, warning=FALSE, message=FALSE} library(crisprBowtie) data(SpCas9, package="crisprBase") crisprNuclease <- SpCas9 spacers <- c("TCCGCGGGCGACAATGGCAT", "TGATCCCGCGCTCCCCGATG", "CCGGGAGCCGGGGCTGGACG", "CCACCCTCAGGTGTGCGGCC", "CGGAGGGCTGCAGAAAGCCT", "GGTGATGGCGCGGGCCGGGC") runCrisprBowtie(spacers, crisprNuclease=crisprNuclease, n_mismatches=3, canonical=FALSE, bowtie_index=file.path(tempDir, "myIndex")) ``` # Applications beyond CRISPR The function `runBowtie` is similar to `runCrisprBowtie`, but does not impose constraints on PAM sequences. It can be used to search for any short read sequence in a genome. ## Example using RNAi (siRNA design) Seed-related off-targets caused by mismatch tolerance outside of the seed region is a well-studied and characterized problem observed in RNA interference (RNA) experiments. `runBowtie` can be used to map shRNA/siRNA seed sequences to reference genomes to predict putative off-targets: ```{r, eval=TRUE} seeds <- c("GTAAAGGT", "AAGGATTG") runBowtie(seeds, n_mismatches=2, bowtie_index=file.path(tempDir, "myIndex")) ``` # Reproducibility ```{r} sessionInfo() ``` # References