%\VignetteIndexEntry{scDD Quickstart}
%\VignettePackage{scDD}
%\VignetteEngine{knitr::knitr}

\documentclass{article}
\usepackage{underscore}

<<style-knitr, eval=TRUE, echo=FALSE, results="asis">>=
BiocStyle::latex()
@

  \bioctitle[scDD]{Identifying differential 
  distributions in single-cell RNA-seq experiments with scDD}
    \author{Keegan Korthauer\thanks{\email{keegan@jimmy.harvard.edu}}}

\begin{document}

\maketitle
\begin{abstract}
The \Rpackage{scDD} package models single-cell gene expression data 
(from single-cell RNA-seq) using flexible nonparamentric Bayesian mixture 
models in order to explicitly handle heterogeneity within cell populations. 
In bulk RNA-seq data, where each measurement is an average over thousands of 
cells, distributions of expression over samples are most often unimodal. 
In single-cell RNA-seq data, however, even when cells represent genetically 
homogeneous populations, multimodal distributions of gene expression values 
over samples are common \cite{korthauer2016}. This type of heterogeneity is 
often treated as a nuisance factor in studies of differential expression in 
single-cell RNA-seq experiments. Here, we explicitly accommodate it in order 
to improve power to detect differences in expression distributions that are 
more complicated than a mean shift.
\end{abstract}

\packageVersion{\Sexpr{BiocStyle::pkg_ver("scDD")}}
Report issues on \url{https://github.com/kdkorthauer/scDD/issues}

\newpage
\tableofcontents
\newpage

\section{Background}

Our aim is two-fold: (1) to detect which genes have different expression 
distributions across two biological conditions and (2) to classify those 
differences into informative patterns. Note that in (1) we explicitly say 
differences in 'distributions' rather than differences in 'average', which 
would correspond to traditional DE (differential expression) analysis in 
bulk RNA-seq. By examining the entire distribution, we are able to detect more
subtle differences as well as describe complex patterns, such as the existence
of subgroups of cells within and across condition that express a given gene
at a different level.

We start by assuming that the log-transformed nonzero expression values arise
out of a Dirichlet Process Mixture of normals model. This allows us to 
characterize expression distributions in terms of the number of modes 
(or clusters). To detect differences in these distributions across conditions,
an approximate Bayes Factor score is used which compares the conditional 
likelihood under the hypothesis of Equivalent distributions (ED) where one 
clustering process governs both conditions jointly, with the hypothesis of 
Differential distributions (DD) where each condition is generated from its 
own clustering process. In the full framework, significance of the scores 
for each gene are evaluated via empirical p-values after permutation. 
Optionally, a fast implementation obtains the p-values from the non-parametric 
Kolmogorov-Smirnov test. Zero values are considered by also implementing a 
$\chi^2$ test of whether the proportion of zero values differs by condition 
(adjusted for overall sample detection rate). More details are provided in 
\cite{korthauer2016}.

After the detection step is carried out, the significantly DD genes are 
classified into four informative patterns based on the number of clusters 
detected and whether they overlap. These patterns, depicted in 
Figure~\ref{figure/patternPlot-1},
include (a) DE (differential expression of unimodal genes), (b) DP 
(differential proportion for multimodal genes), (c) DM (differential modality),
and (d) DB (both differential modality and different component means). 
Genes where a differential proportion of zeroes were identified are classified
as DZ (differential zero). Genes that are identified as significantly 
differentially distributed but do not fall into one of the above categories
are abbreviated NC (for no call). This includes genes with the same number of
components with similar component means, but differential variance. 
For reasons detailed in \cite{korthauer2016}, we do not aim to interpret this
type of pattern.


<<patternPlot, echo=FALSE, fig.show="hide", fig.width=7, fig.height=5>>=
### Note that the following code can be ignored for the purposes of
# analysis with scDD; it is simply used to generate the cartoon of 
# interesting DD patterns (illustration purposes only)
par(mfrow=c(2,2), tcl=-0.5, mai=c(0.4,0.4,0.5,0.3))
x <- seq(0, 6, by=0.05)

## traditional de 
# mu1 is 2 
# mu2 is 4
cord.x <- c(0,x,6) 
cord.y <- c(0,dnorm(x, 2, 0.75),0) 
curve(dnorm(x, 2 , 0.75),xlim=c(0,6),main="Traditional DE",
      xaxt="n", xlab="", ylab="", yaxt="n") 
polygon(cord.x,cord.y,col=rgb(0,0,1,1/4))
cord.x <- c(0,x,6) 
cord.y <- c(0,dnorm(x, 4, 0.75),0) 
lines(x, dnorm(x, 4 , 0.75))
polygon(cord.x,cord.y,col=rgb(1,0,0,1/4))
axis(side=1, at=c(2,4), labels=c(expression(mu[1]), expression(mu[2])),
     pos=0, cex.axis=1.5)
mtext("(A)", side = 3, line=0.5, adj=-0.1, cex=1.2, font=2)

x <- seq(0, 10, by=0.05)
## differential proportion
cord.x <- c(0,x,10) 
cord.y <- c(0,0.3*dnorm(x, 7, 1) + 0.7*dnorm(x, 3, 1),0) 
curve(0.3*dnorm(x, 7, 1) + 0.7*dnorm(x, 3, 1),xlim=c(0,10),main="DP",
      xaxt="n", xlab="", ylab="", yaxt="n") 
polygon(cord.x,cord.y,col=rgb(0,0,1,1/4))
cord.x <- c(0,x,10) 
cord.y <- c(0,0.3*dnorm(x, 3, 1) + 0.7*dnorm(x, 7, 1),0) 
lines(x, 0.3*dnorm(x, 3, 1) + 0.7*dnorm(x, 7, 1))
polygon(cord.x,cord.y,col=rgb(1,0,0,1/4))
axis(side=1, at=c(3,7), labels=c(expression(mu[1]), 
                                 expression(mu[2])), pos=0, cex.axis=1.5)
mtext("(B)", side = 3, line=0.5, adj=-0.1, cex=1.2, font=2)

## differential modes (DM)
cord.x <- c(0,x,6) 
cord.y <- c(0,dnorm(x, 2, 0.75),0) 
curve(dnorm(x, 2 , 0.75),xlim=c(0,6),main="DM", xaxt="n", 
      xlab="", ylab="", yaxt="n") 
polygon(cord.x,cord.y,col=rgb(0,0,1,1/4))
cord.x <- c(0,x,6) 
cord.y <- c(0,0.3*dnorm(x, 2, 0.6) + 0.7*dnorm(x, 4, 0.6),0) 
lines(x, 0.3*dnorm(x, 2, 0.6) + 0.7*dnorm(x, 4, 0.6))
polygon(cord.x,cord.y,col=rgb(1,0,0,1/4))
axis(side=1, at=c(2,4), labels=c(expression(mu[1]), 
                                 expression(mu[2])), pos=0, cex.axis=1.5)
mtext("(C)", side = 3, line=0.5, adj=-0.1, cex=1.2, font=2)

## Both DM and DP
cord.x <- c(0,x,10) 
cord.y <- c(0,0.5*dnorm(x, 2.5, 1) + 0.5*dnorm(x, 7.5, 1),0) 
curve(0.5*dnorm(x, 2.5, 1) + 0.5*dnorm(x, 7.5, 1),
      xlim=c(0,10),main="DB", xaxt="n", xlab="", ylab="", yaxt="n",
      ylim=c(0,max(0.5*dnorm(x, 2.5, 1) + 0.5*dnorm(x, 7.5, 1)))) 
polygon(cord.x,cord.y,col=rgb(0,0,1,1/4))
cord.x <- c(0,x,10) 
cord.y <- c(0,0.8*dnorm(x, 5, 2),0) 
lines(x, 0.8*dnorm(x, 5, 2))
polygon(cord.x,cord.y,col=rgb(1,0,0,1/4))
axis(side=1, at=c(2.5, 5, 7.5), labels=c(expression(mu[1]), 
                                         expression(mu[3]), 
                                         expression(mu[2])), 
     pos=0, cex.axis=1.5)
mtext("(D)", side = 3, line=0.5, adj=-0.1, cex=1.2, font=2)
@

\incfig{figure/patternPlot-1}{0.65\textwidth}
{Illustration of informative DD patterns}{}

The rest of this vignette outlines the main functionality of the 
\Rpackage{scDD} package. This includes:

\begin{itemize}
  \item Identifying genes that are expressed differently between two biological
  conditions and classifying them into informative patterns.
  \item Simulating single-cell RNA-seq data with differential expression that
  exhibits multimodal patterns.
  \item Preprocessing and formatting of single-cell RNA-seq data to facilitate
  analysis
  \item Visualizing the expression patterns using a violin plotting scheme
\end{itemize}

\section{Identify and Classify DD genes}
In this section, we demonstrate how to use the main function \Rfunction{scDD} 
to find genes with differential distributions and classify them into the 
patterns of interest described in the previous section. 

First, we need to load the \Rpackage{scDD} package. For each of the following
sections in this vignette, we assume this step has been carried out.
<<load lib, message=FALSE>>=
library(scDD)
@

Next, we load the toy simulated example \Rclass{SingleCellExperiment} object that we
will use for identifying and classifying DD genes.
<<load exDat>>=
data(scDatExSim)
@

Verify that this object is a member of the \Rclass{SingleCellExperiment} class 
and that it contains 200 samples and 30 genes. The \Rcode{colData} slot 
(which contains a dataframe of metadata for the cells) should have a column 
that contains the biological condition or grouping of interest. In this 
example data, that variable is the 'condition' variable. Note that the 
input gene set needs to be in \Rclass{SingleCellExperiment} format, and should 
contain normalized counts. In practice, it is also advisable to filter 
the input gene set to remove genes that have an extremely high proportion
of zeroes (see Section 6). More specifically, the test for differential
distributions of the expressed measurements will not be carried out on 
genes where only one or fewer cells had a nonzero measurement (these 
genes will still be tested for differential proportion of zeroes (DZ) 
if the \Rcode{testZeroes} parameter is set to \Rcode{TRUE}, however).

<<check class>>=
class(scDatExSim)
dim(scDatExSim)
@

Next, specify the hyperparameter arguments that we'll pass to the 
\Rfunction{scDD} function. These values reflect heavy-tailed distributions
over the paramaters and are robust to many different settings in simulation
(see \cite{korthauer2016} for more details).

<< prior >>=
prior_param=list(alpha=0.01, mu0=0, s0=0.01, a0=0.01, b0=0.01)
@
 
Finally, call the \Rfunction{scDD} function to test for differential 
distributions, classify DD genes, and return the results. If the biological
condition or grouping variable in the \Rcode{colData} slot is named 
something other than 'condition', you'll need to specify the name of the 
variable as an argument to the \Rfunction{scDD} function (set the 
\Rcode{condition} argument equal to the name of the relevant column). 
We won't perform the test for a difference in the proportion of zeroes 
since none exists in this simulated toy example data, but this option 
can be invoked by changing the \Rcode{testZeroes} option to \Rcode{TRUE}. 
Note that the default option is to use a fast test of differential 
distributions that involves the Kolmogorov-Smirnov test instead of the 
full permutation testing framework. This provides a fast implementation
of the method at the cost of potentially slightly decreased power compared
to the full scDD framework described in the manuscript (see Section 4 for 
more details).

Note that if you are only interested in obtaining the results of the test for
significance, and not in the classification of genes to the patterns mentioned
above, you can achieve further computational speedup by setting 
\Rcode{categorize} to \Rcode{FALSE}.

<<main engine>>=
scDatExSim <- scDD(scDatExSim, prior_param=prior_param, testZeroes=FALSE)
@

Four results objects are added to the \Robject{scDatExSim} SingleCellExperiment 
object in the \Rclass{metadata} slot. For convenience, the results objects can
be extracted with the \Rfunction{results} function. 

The main results object is the \Rcode{"Genes"} object which is a 
\Rclass{data.frame} containing the following columns: 
\begin{itemize}
  \item \Rcode{gene}: gene name (matches rownames of SCdat)
  \item \Rcode{DDcategory}: name of the DD pattern (DE, DP, DM, DB, DZ), 
  or NC (no call), or NS (not significant). 
  \item \Rcode{Clusters.combined}: the number of clusters identified 
  when pooling condition 1 and 2 together
  \item \Rcode{Clusters.c1}: the number of clusters identified in 
  condition 1 alone
  \item \Rcode{Clusters.c2}: the number of clusters identified in 
  condition 2 alone
  \item \Rcode{nonzero.pvalue}: p-value for KS test of differential 
  distributions of expressed cells
  \item \Rcode{nonzero.pvalue.adj}: Benjamini-Hochberg adjusted p-value 
  for KS test of differential distributions
  \item \Rcode{zero.pvalue}: p-value for test of difference in dropout rate
  (only if \Rcode{testZeroes==TRUE})
  \item \Rcode{zero.pvalue.adj}: Benjamini-Hochberg adjusted p-value for 
  test of difference in dropout rate (only if \Rcode{testZeroes==TRUE})
  \item \Rcode{combined.pvalue}: Fisher's combined p-value for a difference
  in nonzero or zero values (only if \Rcode{testZeroes==TRUE})
  \item \Rcode{combined.pvalue.adj}: Benjamini-Hochberg adjusted Fisher's 
  combined p-value for a difference
  in nonzero or zero values (only if \Rcode{testZeroes==TRUE})
\end{itemize}

This can be extracted using the following call to \Rfunction{results}:

<<main results>>=
RES <- results(scDatExSim)
head(RES)
@

The remaining three results objects are matrices (first for condition 1 and 
  2 combined, then condition 1 alone, then condition 2 alone) that contain  
  the cluster memberships (partition estimates) 
  for each sample (for clusters 1,2,3,...) in columns 
  and genes in rows. Zeroes, which are not involved in the clustering, 
  are labeled as zero. These can be extracted by specifying an alternative
  \Robject{type} when calling the \Rfunction{results} function. For example,
  we can extract the partition estimates for condition 1 with the following:

<<partition results>>=
PARTITION.C1 <- results(scDatExSim, type="Zhat.c1")
PARTITION.C1[1:5,1:5]
@

\section{Alternate test for Differential Distributions}

The first step in the scDD framework that identifies Differential 
Distributions was designed to have optimal power to detect differences in
expression distributions, but the utilization of a permutation test on the
Bayes Factor can be computationally demanding. While this is not an issue
when machines with multiple cores are available since the code takes advantage
of parallel processing, we also provide the option to use an alternate test to
detect distributional differences that avoides the use of a permutation test. 
This option (default) uses the Kolmogorov-Smirnov test, which examines the null
hypothesis that two samples are generated from the same continuous 
distribution. While the use of this test yielded slighlty lower power in 
simulations than the full permutation testing framework at lower sample sizes
(50-75 cells in each condition) and primarily affected the DB pattern genes, 
it does not require permutations and thus is orders of magnitude faster. The
overall power to detect DD genes in simulation was still comparable or 
favorable to exisiting methods for differential expression analysis of 
scRNA-seq experiments. 

The remaining steps of the scDD framework remain unchanged if the alternate
test is used. That is, the Dirichlet process mixture model is still fit to
the observed expression measurements so that the significant DD genes can be
categorized into patterns that represent the major distributional changes, 
and results can still be visualized with violin plots using the 
\Rfunction{sideViolin} function described in the Plotting section. 

The option to use the full permutation testing procedure instead of the 
Kolmogorov-Smirnov test is invoked by setting the number of permutations to
something other than zero (the \Rcode{permutations} argument in 
\Rfunction{scDD}) when calling the main \Rfunction{scDD} function as follows:

<<main engine perm>>=
scDatExSim <- scDD(scDatExSim, prior_param=prior_param, 
                 testZeroes=FALSE, permutations=100)
@

The line above will run 100 permutations of every gene.  In practice, it is
recommended that at least 1000 permutations are carried out if using the full
permutation testing option. Note that this option will take significantly 
longer than the default option to use the alternate KS test, and computation
time will increase with more genes and/or more permutations, but multiple 
cores will automatically be utilized (if available) via the 
\Rpackage{BiocParallel} package. By default, an OS appropriate back-end 
using the number of cores on the machine minus 2 is chosen automatically. 
Alternatively, you can specificy the number of 
cores to use by passing in a \Rcode{param} argument in the \Rcode{scDD} 
function call (where the \Rcode{param} argument is an object of class 
\Rcode{MulticoreParm} for Linux-like OS or \Rcode{SnowParam} for Windows). 
For example, to use 12 cores on a Linux-like OS, specify 
\Rcode{param=MulticoreParam(workers=12)}.

The results returned by \Rfunction{scDD} remain exactly as described 
in the previous section, with the exception that the \Rcode{nonzero.pvalue} 
and \Rcode{nonzero.pvalue.adj} columns of the \Rfunction{Genes} data frame 
now contain the p-values and Benjamini-Hochberg adjusted p-values of the
perumtation test of the Bayes Factor for independence of condition membership
with clustering. 

\section{Simulation}

Here we show how to generate a simulated single-cell RNA-seq dataset which 
contains multi-modal genes. The \Rfunction{simulateSet} function simulates 
data from a two-condition experiment with a specified number of genes that 
fall into each of the patterns of interest. For DD genes, these include DE
(differential expression of unimodal genes), DP (differential proportion 
for multimodal genes), DM (differential modality), and DB (both differential 
modality and mean expression levels), and for ED genes these include EE 
(equivalent expression for unimodal genes) and EP (equivalent proportion
for multimodal genes). The simulation parameters are based on observed data
from two conditions, so the function requires an \Rclass{SingleCellExperiment} 
formatted dataset as input. The output of the function is also a 
\Rclass{SingleCellExperiment} object with information about the true category
of each gene and its simulated fold change stored in the \Rcode{rowData} slot.

First, we load the toy example \Rclass{SingleCellExperiment} to simulate from
<<load exdat2>>=
data(scDatEx)
@

We'll verify that this object is a member of the \Rclass{SingleCellExperiment} class 
and that it contains 142 samples and 500 genes
<< check class2>>=
class(scDatEx)
dim(scDatEx)
@
Next we need to set the arguments that will be passed to the 
\Rfunction{simulateSet} function. In this example we will simulate 30 genes
total, with 5 genes of each type and 100 samples in each of two conditions. 
We also set a random seed for reproducibility.

<< set num >>=
nDE <- 5
nDP <- 5
nDM <- 5
nDB <- 5
nEE <- 5
nEP <- 5
numSamples <- 100
seed <- 816
@

Finally, we'll create the simulated set with specified numbers of DE, DP, DM,
DM, EE, and EP genes and specified number of samples, where DE gene fold 
changes represent 2 standard deviations of the observed fold change 
distribution, and multimodal genes have cluster mean distance of 4 
standard deviations.
<<simset>>=
SD <- simulateSet(scDatEx, numSamples=numSamples, 
                  nDE=nDE, nDP=nDP, nDM=nDM, nDB=nDB, 
                  nEE=nEE, nEP=nEP, sd.range=c(2,2), modeFC=4, plots=FALSE, 
                  random.seed=seed)

# load the SingleCellExperiment package to use rowData method
library(SingleCellExperiment)
head(rowData(SD))
@               

The \Rcode{normcounts} assay \Robject{SD} object (of class 
\Rclass{SingleCellExperiment}) contains 
simulated expression values. The \Rcode{rowData} slot stores the 
fold change/modal distance values and gene expression categories,
which are useful in assessing performance of a differential
expression method.

\section{Formatting and Preprocessing}

Before beginning an analysis using \Rpackage{scDD}, you will need to carry out
a few preprocessing steps. This includes normalization, filtering of genes 
that are mostly zero, and getting the data into format that is expected by 
the \Rfunction{scDD} function. The following subsections will detail 
these steps.

\subsection{Constructing a SingleCellExperiment object}

In this subsection, we provide a quick example of how to construct an object 
of the \Rclass{SingleCellExperiment} class. 
For more detailed instructions, refer 
to the \Rpackage{SingleCellExperiment} package documentation.

Here we will convert the toy example data object \Robject{scDatExList} into a 
\Rclass{SingleCellExperiment} object. This object is a list of two matrices 
(one for each condition) of  normalized counts. Each matrix has genes in rows
and cells in columns, and is named "C1" or "C2" (for condition 1 or 2).

First, load the \Rpackage{SingleCellExperiment} package:

<<load summarizedexp, message=FALSE>>=
library(SingleCellExperiment)
@

Next, load the toy example data list:
<<load exdatlist>>=
data(scDatExList)
@

Next, create a vector of condition membership labels (these should be 1 or 2).
In our example data list, we have 78 cells in condition 1, and 64 cells in 
condition 2.

<<create condition>>=
condition <- c(rep(1, ncol(scDatExList$C1)), rep(2,  ncol(scDatExList$C2)))
@

The rows and columns of the expression matrix should have unique names. This is 
already the case for our toy example dataset.  
The names of the columns should also correspond to the names of the condition 
membership labels in \Robject{condition}.

<<rownames>>=
# Example of row and column names
head(rownames(scDatExList$C1))
head(colnames(scDatExList$C2))

names(condition) <- c(colnames(scDatExList$C1), colnames(scDatExList$C2))
@

Once our labeling is intact, we can call the \Rfunction{SingleCellExperiment} 
function and specify the two relevant pieces of information. 
The \Rcode{normcounts} assays slot should contain one matrix, so we use 
\Rcode{rbind} here to combine both conditions. Optionally, 
additional experiment information can be stored in additional slots; see the
\Rpackage{SingleCellExperiment} package for more details.
<<create sce>>=
sce <- SingleCellExperiment(assays=list(normcounts=cbind(scDatExList$C1, 
                                                         scDatExList$C2)),
                                colData=data.frame(condition))
show(sce)
@

\subsection{Filtering and Normalization}

In this subsection, we demonstrate the utility of the \Rfunction{preprocess} 
function, which can be helpful if working with raw counts, or data which 
contains genes that are predominantly zero (common in single-cell RNA-seq 
experiments). This function takes as input a list of data matrices, one for
each condition.

First, load the toy example data and verify it is a 
\Rclass{SingleCellExperiment} object:
<<load exdat3e>>=
data(scDatEx)
show(scDatEx)
@

Now, apply the \Rfunction{preprocess} function with the \Robject{zero.thresh} argument set to 0.9 so that genes are filtered out if 
they are 90 (or more) percent zero.

<<preprocess>>=
scDatEx <- preprocess(scDatEx, zero.thresh=0.9)
show(scDatEx)
@

We can see that no genes were removed, since all have fewer than 90 percent 
of zeroes to begin with.

Now, apply the preprocess function again, but this time use a more stringent
threshold on the proportion of zeroes and apply normalization using size 
factors calculated using the \Rpackage{scran}. In this
example, we set the \Robject{zero.thresh} argument to 0.50 so that genes with 
more than 50 percent zeroes are filtered out and we set the 
\Robject{scran_norm} argument to \Rcode{TRUE} to return \Rpackage{scran} 
normalized counts. 

<<threshe>>= 
scDatEx.scran <- preprocess(scDatEx, zero.thresh=0.5, scran_norm=TRUE)
show(scDatEx.scran)
@

We can see that 38 genes were removed due to having more than 50 percent zeroes.
The warning message here is letting us know that our SingleCellExperiment 
object already contained a \Rcode{normcounts} assay, and that by specifying 
\Rcode{scran_norm=TRUE}, we are replacing that with the scran normalized counts 
and moving the original values to a new slot called \Rcode{normcounts-orig}. 

Also included is the option to use median normalization, invoked by setting
\Robject{median_norm} to \Rcode{TRUE}.

\section{Plotting}

Next we demonstrate the plotting routine that is implemented in the 
\Rfunction{sideViolin} function. This function produces side-by-side 
violin plots (where the curves represent a smoothed kernel density estimate)
of the log-transformed data. A count of 1 is added before log-transformation
so that zeroes can be displayed, but they are not included in the density 
estimation. Each condition is represented by one violin plot. Individual 
data points are plotted (with jitter) on top. 

We illustrate this function by displaying the six types of simulated genes 
using the toy example simulated dataset. First, load the toy simulated dataset:

<<load exdat4>>=
data(scDatExSim)
@

Next, load the \Rpackage{SingleCellExperiment} package to facilitate 
subset operations on \Rcode{SingleCellExperiment} class objects:
<<load sumExp, message=FALSE>>=
library(SingleCellExperiment)
@

The following lines will produce the figures in Figure~\ref{figure/plotGrid-1}.

Plot side by side violin plots for Gene 1 (DE):
<<plot DE, eval=TRUE, message=FALSE>>=
de <- sideViolin(normcounts(scDatExSim)[1,], scDatExSim$condition, 
           title.gene=rownames(scDatExSim)[1])
@

Plot side by side violin plots for Gene 6 (DP):
<<plot DP, eval=TRUE, message=FALSE>>=
dp <- sideViolin(normcounts(scDatExSim)[6,], scDatExSim$condition, 
           title.gene=rownames(scDatExSim)[6])
@

Plot side by side violin plots for Gene 11 (DM):
<<plot DM, eval=TRUE, message=FALSE>>=
dm <- sideViolin(normcounts(scDatExSim)[11,], scDatExSim$condition, 
           title.gene=rownames(scDatExSim)[11])
@

Plot side by side violin plots for Gene 16 (DB):
<<plot DB, eval=TRUE, message=FALSE>>=
db <- sideViolin(normcounts(scDatExSim)[16,], scDatExSim$condition, 
           title.gene=rownames(scDatExSim)[16])
@

Plot side by side violin plots for Gene 21 (EP):
<<plot EP, eval=TRUE, message=FALSE>>=
ep <- sideViolin(normcounts(scDatExSim)[21,], scDatExSim$condition, 
           title.gene=rownames(scDatExSim)[21])
@

Plot side by side violin plots for Gene 26 (EE):
<<plot EE, eval=TRUE, message=FALSE>>=
ee <- sideViolin(normcounts(scDatExSim)[26,], scDatExSim$condition, 
           title.gene=rownames(scDatExSim)[26])
@

The plot objects returned by \Rfunction{sideViolin} are standard 
\Rpackage{ggplot2} objects, and thus can be manipulated into multipanel
figures with the help of the \Rpackage{gridExtra} or \Rpackage{cowplot} 
packages. Here we use \Rfunction{grid.arrange} from the \Rpackage{gridExtra}
package to visualize all the plots generated above. The end result is shown 
in Figure~\ref{figure/plotGrid-1}.


<<plotGrid, fig.show='hide', fig.width=8.5, fig.height=11, message=FALSE>>=
library(gridExtra)
grid.arrange(de, dp, dm, db, ep, ee, ncol=2)
@

\incfig{figure/plotGrid-1}{0.92\textwidth}{Example Simulated DD genes}{} 

\section{Session Info}
Here is the output of \Rfunction{sessionInfo} on the system where 
this document was compiled:

<<sessionInfo, eval=TRUE>>=
sessionInfo()
@

\bibliography{vignette}
\end{document}