--- title: "Data visualisation methods in scater" author: - name: "Davis McCarthy" affiliation: - EMBL European Bioinformatics Institute - name: "Aaron Lun" affiliation: - Cancer Research UK Cambridge Institute, University of Cambridge package: scater output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Data visualisation methods in scater} %\VignetteEngine{knitr::rmarkdown} %VignetteEncoding{UTF-8} --- ```{r knitr-options, echo=FALSE, message=FALSE, warning=FALSE} ## To render an HTML version that works nicely with github and web pages, do: ## rmarkdown::render("vignettes/vignette.Rmd", "all") library(knitr) opts_chunk$set(fig.align = 'center', fig.width = 6, fig.height = 5, dev = 'png', error=FALSE, message=FALSE, warning=FALSE) library(ggplot2) theme_set(theme_bw(12)) ``` # Introduction This document provides some examples of the data visualisation functions available in `r Biocpkg("scater")`. * `plotExpression`: plot cell expression levels for one or more genes; * `plotReducedDim`: plot (and/or calculate) reduced dimension coordinates; * various QC plots, described in the QC vignette. We will demonstrate on some example data generated below: ```{r} library(scater) data("sc_example_counts") data("sc_example_cell_info") example_sce <- SingleCellExperiment( assays = list(counts = sc_example_counts), colData = sc_example_cell_info ) example_sce <- normalize(example_sce) example_sce ``` # Plots of expression values The `plotExpression` function makes it easy to plot expression values for a subset of genes or features. This can be particularly useful for further examination of features identified from differential expression testing, pseudotime analysis or other analyses. By default, it uses expression values in the `"logcounts"` assay, but this can be changed through the `exprs_values` argument. ```{r plot-expression} plotExpression(example_sce, rownames(example_sce)[1:6], x = "Mutation_Status", exprs_values = "logcounts") ``` Setting `x` will determine the covariate to be shown on the x-axis. This can be a field in the column metadata or the name of a feature (to obtain the expression profile across cells). Categorical covariates will yield grouped violins as shown above, with one panel per feature. By comparison, continuous covariates will generate a scatter plot in each panel, as shown below. ```{r plot-expression-scatter} plotExpression(example_sce, rownames(example_sce)[1:6], x = "Gene_0001") ``` The points can also be coloured, shaped or resized by the column metadata or expression values. ```{r plot-expression-col} plotExpression(example_sce, rownames(example_sce)[1:6], colour_by = "Cell_Cycle", shape_by = "Mutation_Status", size_by = "Gene_0002") ``` For categorical `x`, we can also show the median expression level per group on the plot to summarise the distribution of expression values: ```{r plot-expression-theme-bw} plotExpression(example_sce, rownames(example_sce)[7:12], x = "Mutation_Status", exprs_values = "counts", colour = "Cell_Cycle", show_median = TRUE, xlab = "Mutation Status", log = TRUE) ``` Directly plotting the gene expression without any `x` or other visual parameters will generate a set of grouped violin plots, coloured in an aesthetically pleasing manner. ```{r plot-expression-many} plotExpression(example_sce, rownames(example_sce)[1:6]) ``` # Dimensionality reduction plots ## Using the `reducedDims` slot The `SingleCellExperiment` object has a `reducedDims` slot, where coordinates for reduced dimension representations of the cells can be stored. These can be accessed using the `reducedDim` and `reducedDims` functions, which are described in more detail in the `r Biocpkg("SingleCellExperiment")` documentation. In the code below, we perform a principal components analysis (PCA) and store the results in the `"PCA"` slot. ```{r} example_sce <- runPCA(example_sce) reducedDimNames(example_sce) ``` Any reduced dimension results can be plotted using the `plotReducedDim` function: ```{r plot-reduceddim-4comp-colby-shapeby} plotReducedDim(example_sce, use_dimred = "PCA", colour_by = "Treatment", shape_by = "Mutation_Status") ``` We can also colour and size points by the expression of particular features: ```{r plot-reduceddim-4comp-colby-sizeby-exprs} plotReducedDim(example_sce, use_dimred = "PCA", colour_by = "Gene_1000", size_by = "Gene_0500") ``` ## Generating PCA plots The `plotPCA` function makes it easy to produce a PCA plot directly from a `SingleCellExperiment` object, which is useful for visualising the relationships between cells. The default plot shows the first two principal components, if `"PCA"` is already in the `reducedDims` slot. ```{r plot-pca-default} plotPCA(example_sce) ``` If pre-existing `"PCA"` results are not present, the function will automatically call `runPCA` to generate the results prior to plotting. However, it may be preferable for users to call `runPCA` manually if multiple plots are to be generated from the same results. This avoids re-calculation of the reduced dimension results, which can be time-consuming for very large data sets. By default, `runPCA` performs PCA on the log-counts using the 500 features with the most variable expression across all cells. The number of most-variable features used can be changed with the `ntop` argument. Alternatively, a specific set of features to use for PCA can be defined with the `feature_set` argument. This is demonstrated with the feature controls below, to identify technical factors of variation:. ```{r plot-pca-feature-controls} example_sce2 <- runPCA(example_sce, feature_set = rowData(example_sce)$is_feature_control) plotPCA(example_sce2) ``` Multiple components can be plotted in a series of pairwise plots. When more than two components are plotted, the diagonal boxes in the scatter plot matrix show the density for each component. ```{r plot-pca-4comp-colby-shapeby} example_sce <- runPCA(example_sce, ncomponents=20) plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment", shape_by = "Mutation_Status") ``` As shown above, various metadata variables can be used to define the colour, shape and size of points in the scatter plot. We can also use the colour and size of point in the plot to reflect feature expression values. ```{r plot-pca-4comp-colby-sizeby-exprs} plotPCA(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000") ``` ## Generating _t_-SNE plots _t_-distributed stochastic neighbour embedding (_t_-SNE) is widely used for visualizing complex single-cell data sets. The same procedure described for PCA plots can be applied to generate _t_-SNE plots using `plotTSNE`, with coordinates obtained using `runTSNE` via the `r CRANpkg("Rtsne")` package. We strongly recommend generating plots with different random seeds and perplexity values, to ensure that any conclusions are robust to different visualizations. ```{r plot-tsne-1comp-colby-sizeby-exprs} # Perplexity of 10 just chosen here arbitrarily. example_sce <- runTSNE(example_sce, perplexity=10, rand_seed=1000) plotTSNE(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000") ``` It is also possible to use the pre-existing PCA results as input into the _t_-SNE algorithm. This is useful as it improves speed by using a low-rank approximation of the expression matrix; and reduces random noise, by focusing on the major factors of variation. The code below uses the first 10 principal components to perform the _t_-SNE. ```{r plot-tsne-from-pca} example_sce <- runTSNE(example_sce, perplexity=10, rand_seed=1000, use_dimred="PCA", n_dimred = 10) plotTSNE(example_sce, colour_by="Treatment") ``` Users can force `plotTSNE` to call `runTSNE` by setting `rerun=TRUE`, even when `"TSNE"` already exists in the input `SingleCellExperiment` object. Users can also pass parameters for `runTSNE` directly to `plotTSNE` via the `run.args` argument. The same applies for the other `plot*` and `run*` arguments. ## Generating diffusion maps Again, the same can be done for diffusion maps using `plotDiffusionMap`, with coordinates obtained using `runDiffusionMap` via the `r Biocpkg("destiny")` package. ```{r plot-difmap-1comp-colby-sizeby-exprs} example_sce <- runDiffusionMap(example_sce) plotDiffusionMap(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000") ``` # Session information {.unnumbered} ```{r} sessionInfo() ```