---
title: "SynMut: Tools for Designing Synonymously Mutated Sequences"
author:
- "Haogao Gu, Leo L.M. Poon"
- "School of Public Health, The University of Hong Kong"
date: "`r Sys.Date()`"
output:
prettydoc::html_pretty:
toc: true
theme: tactile
highlight: github
vignette: >
%\VignetteIndexEntry{SynMut: Designing Synonymous Mutants for DNA Sequences}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
Synonymous mutations refer to the mutations in DNA/RNA sequences that cause
__no__ modification in the translated amino acid sequence. Most of the
synonymous mutations are also silent mutations as they do not have observable
effects on the organism's phenotype. Designing mutant sequences with synonymous
mutations is often applied in many biological studies as a method to control
some unwanted changes in the translated amino acid sequences.
The codon usage bias and dinucleotide usage bias are two genomic signatures in
DNA/RNA sequences, even for synonymous sequences. Characterizing the functions
of synonymous sequences with different codon usage bias or dinucleotide usage
bias help to study their impacts on various biological functions. In fact, this
method has been applied in many researches in virology.
`SynMut` provides tools for generating multiple synonymous DNA sequences of
different genomic features (in particular codon / dinucleotide usage pattern).
Users can also specify mutable regions in the sequences (this is particular
useful as there are some conserved regions in the genome which we do not want to
modify). This tool was originally designed for generating recombinant virus
sequences in influenza A virus to study the effects of different dinucleotide
usage and codon usage, yet these functions provided in this package can be
generic to a variety of other biological researches.
Below is a flowchart illustrating how the components work togaether in this
package.
```{r, echo=FALSE, message = FALSE}
url <- paste0("https://raw.githubusercontent.com/Koohoko/Koohoko.github.io/",
"master/SynMut/images/component.png")
```
## Getting started
### Installation
```{r, eval=FALSE}
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
if (!requireNamespace("SynMut"))
BiocManager::install("SynMut")
```
### Input data
We use the below data in the package for example.
* `example.fasta`: The fasta file contains the segment 7 and segment 8 DNA
sequences of *Influenza A/Brisbane/59/2007 (H1N1) (BR59)*.
* `target_regions.csv`: The region file in csv format reads as a `data.frame`
specifying the user-defined mutable positions (in amino acid position)
correspond to the DNA sequences.
```{r, message=FALSE}
library("SynMut")
filepath.fasta <- system.file("extdata", "example.fasta", package = "SynMut")
filepath.csv <- system.file("extdata", "target_regions.csv", package = "SynMut")
region <- read.csv(filepath.csv)
```
The `input_seq` function takes either a system fasta file or a
DNAStringSet object as input, to construct a `regioned_dna` object that is used
in the `SynMut` package.
*Important Notes*: if the `region` parameter is specified in the resulted
regioned_dna object, it will be automatically applied to all the downstream
functions for mutations. Mutations will only performed in the specified mutable
regions.
```{r}
rgd.seq <- input_seq(filepath.fasta, region)
rgd.seq
```
### Access the data
Various `get_` functions were used to get some useful information:
* `get_dna`: Access the DNA sequences. This will return a DNAStringSet object
(from `Biostrings` package).
```{r}
get_dna(rgd.seq)
```
* `get_region`: Access the user-defined mutable region. if there is no specified
regions, this function will return a `list` of length 0.
```{r}
str(get_region(rgd.seq))
get_region(input_seq(filepath.fasta))
```
* `get_cu`: Get the codon usage
* `get_du`: Get dinucleotide usage
* `get_nu`: Get nucleotide usage
```{r}
get_cu(rgd.seq)
get_du(rgd.seq)
get_nu(rgd.seq)
```
* We also provide functions to:
- get the codon usage frequency of synonymous codons: `get_freq`
- get Relative Synonymous Codon Usage (rscu) of
synonymous codons: `get_rscu`
## Generating mutations
### Random synonymous mutants
Generating random synonymous mutations (in specific region if provided in the
`input_seq`), with optional keeping or not keeping the original codon usage
bias.
```{r}
# Random synonymous mutations
mut.seq <- codon_random(rgd.seq)
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
# Keeping the original codon usage pattern
mut.seq <- codon_random(rgd.seq, keep = TRUE)
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
```
We can also specify the `n` parameter to control the proportion of the codons to
be mutated.
```{r}
# Fifty percent of the codons were allowed for mutation
mut.seq <- codon_random(rgd.seq, n = 0.5)
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
```
### Synonymous mutants with maximal/minimal usage of specific codon
When studying the role of a particular codon, it would be useful to have
the mutants with maximal/minimal usage of that codon. The `codon_to` function
will do this job for you. Pass a string of codon to either the `max.codon` or
`min.codon` argument to maximize or minimize the usage of certain codon in the
sequence.
```{r}
# Generate AAC-maximized mutations
mut.seq <- codon_to(rgd.seq, max.codon = "AAC")
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
# Generate AAC-minimized mutations
mut.seq <- codon_to(rgd.seq, min.codon = "AAC")
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
```
### Synonymous mutants with maximal/minimal usage of specific dinucleotide
Use `dinu_to` to generate mutations with maximal/minimal usage of specific
dinucleotide. This was done by a two-step heuristic greedy algorithm, details
can be found at this [link](https://koohoko.github.io/SynMut/algorithm.html).
An alternative `keep = TRUE` argument allow retaining the original codon usage
bias. This can be useful when in combination with `codon_mimic` in the [next
section](#synonymous-mutants-mimicking-a-specific-codon-usage-pattern) to design
mutant sequences with similar codon usage bias but distinct specific
dinucleotide usage.
```{r}
# Maximaize the usage of "CG" dinucleotide in the pre-defined region
mut.seq <- dinu_to(rgd.seq, max.dinu = "cg")
# Check the dinucelotide usage difference between the mutant and the original
get_du(mut.seq) - get_du(rgd.seq)
# Minimize the usage of "CA", and compare the dinucleotide usage.
mut.seq <- dinu_to(rgd.seq, min.dinu = "CA")
get_du(mut.seq) - get_du(rgd.seq)
# Maximize the usage of "CG" while keeping the original codon usage
mut.seq <- dinu_to(rgd.seq, max.dinu = "cg", keep = TRUE)
# Compare the dinucleotide usage
get_du(mut.seq) - get_du(rgd.seq)
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
```
### Synonymous mutants mimicking a specific codon usage pattern
The function `codon_mimic` mutates the sequences to mimic a target codon usage
patterns. Detail algorithm was provided at this
[link](https://koohoko.github.io/SynMut/algorithm.html).
The `alt` argument specifies the target codon usage in either a codon usage
vector (result from `get_cu`) or a DNAStringSet of length 1 representing the
desired codon usage.
```{R}
# Use a codon usage vector as a target
target <- get_cu(rgd.seq)[2,]
mut.seq <- codon_mimic(rgd.seq, alt = target)
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
# Use a sequence as a target
target <- Biostrings::DNAStringSet("TTGAAAA-CTC-N--AAG")
mut.seq <- codon_mimic(rgd.seq, alt = target)
# Compare the codon usage
get_cu(mut.seq) - get_cu(rgd.seq)
# Compare the synonymous codon usage frequency
get_freq(mut.seq) - get_freq(rgd.seq)
# Compare the Relative Synonymous Codon Usage (RSCU)
get_rscu(mut.seq) - get_rscu(rgd.seq)
```
## Output the results
Output the DNA mutant sequences
```{r, eval = FALSE}
Biostrings::writeXStringSet(get_dna(rgd.seq), "rgd.fasta")
```
## Session information
```{R}
sessionInfo()
```
***