--- title: "ggmsa-Getting Started" author: "Lang Zhou and GuangChuang Yu" output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 3 code_folding: show date: "`r Sys.Date()`" bibliography: ggmsa.bib vignette: > %\VignetteIndexEntry{ggmsa} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) CRANpkg <- function(pkg) { cran <- "https://cran.r-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) } Biocpkg <- function(pkg) { sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } # Packages ------------------------------------------------------------------- library(ggmsa) library(ggplot2) ``` # Install package ```{r eval = FALSE} if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("ggmsa") ``` # Introduction ggmsa is a package designed to plot multiple sequence alignments. This package implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful. It uses module design to annotate sequence alignments and allows to accept other data sets for diagrams combination. In this tutorial, we’ll work through the basics of using ggmsa. ```{r results="hide", message=FALSE, warning=FALSE} library(ggmsa) ``` ```{r echo=FALSE, out.width='50%'} knitr::include_graphics("man/figures/workflow.png") ``` # Importing MSA data We’ll start by importing some example data to use throughout this tutorial. Expect FASTA files, some of the objects in R can also as input. `available_msa()` can be used to list MSA objects currently available. ```{r warning=FALSE} available_msa() protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa") miRNA_sequences <- system.file("extdata", "seedSample.fa", package = "ggmsa") nt_sequences <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa") ``` # Basic use: MSA Visualization The most simple code to use ggmsa: ```{r fig.height = 2, fig.width = 10, warning=FALSE} ggmsa(protein_sequences, 300, 350, color = "Clustal", font = "DroidSansMono", char_width = 0.5, seq_name = TRUE ) ``` ## Color Schemes ggmsa predefines several color schemes for rendering MSA are shipped in the package. In the same ways, using `available_msa()` to list color schemes currently available. Note that amino acids (protein) and nucleotides (DNA/RNA) have different names. ```{r warning=FALSE} available_colors() ``` ```{r echo=FALSE, out.width = '50%'} knitr::include_graphics("man/figures/schemes.png") ``` ## Font Several predefined fonts are shipped ggmsa. Users can use `available_fonts()` to list the font currently available. ```{r warning=FALSE} available_fonts() ``` # MSA Annotation ggmsa supports annotations for MSA. Similar to the ggplot2, it implements annotations by `geom` and users can perform annotation with `+` , like this: `ggmsa() + geom_*()`. Automatically generated annotations that containing colored labels and symbols are overlaid on MSAs to indicate potentially conserved or divergent regions. For example, visualizing multiple sequence alignment with **sequence logo** and **bar chart**: ```{r fig.height = 2.5, fig.width = 11, warning = FALSE, message = FALSE} ggmsa(protein_sequences, 221, 280, seq_name = TRUE, char_width = 0.5) + geom_seqlogo(color = "Chemistry_AA") + geom_msaBar() ``` This table shows the annnotation layers supported by ggmsa as following: ```{r echo=FALSE, results='asis', warning=FALSE, message=FALSE} library(kableExtra) x <- "geom_seqlogo()\tgeometric layer\tautomatically generated sequence logos for a MSA\n geom_GC()\tannotation module\tshows GC content with bubble chart\n geom_seed()\tannotation module\thighlights seed region on miRNA sequences\n geom_msaBar()\tannotation module\tshows sequences conservation by a bar chart\n geom_helix()\tannotation module\tdepicts RNA secondary structure as arc diagrams(need extra data)\n " xx <- strsplit(x, "\n\n")[[1]] y <- strsplit(xx, "\t") %>% do.call("rbind", .) y <- as.data.frame(y, stringsAsFactors = FALSE) colnames(y) <- c("Annotation modules", "Type", "Description") knitr::kable(y, align = "l", booktabs = TRUE, escape = TRUE) %>% kable_styling(latex_options = c("striped", "hold_position", "scale_down")) ``` # Learn more Check out the guides for learning everything there is to know about all the different features: - [Getting Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html) - [Annotations](https://yulab-smu.github.io/ggmsa/articles/Annotations.html) - [Color Schemes and Font Families](https://yulab-smu.github.io/ggmsa/articles/Color_schemes_And_Font_Families.html) - [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html) - [Other Modules](https://yulab-smu.github.io/ggmsa/articles/Other_Modules.html) - [View Modes](https://yulab-smu.github.io/ggmsa/articles/View_modes.html) # Session Info ```{r echo = FALSE} sessionInfo() ```