CellScape is a visualization tool for integrating single cell phylogeny with genomic content to clearly display evolutionary progression and tumour heterogeneity.
To install CellScape, type the following commands in R:
Run the examples by:
One of these, for example, is copy number data of a triple negative breast cancer patient published in Wang et al. (2014).
\(cnv\_data\) (data frame) (Required if not providing mut_data nor mut_data_matrix) Single cell copy number segments data. This data frame includes the following columns:
\(mut\_data\) (data frame) (Required if not providing cnv_data nor mut_data_matrix) Single cell targeted mutation data frame. This data frame includes the following columns:
\(mut\_data\_matrix\) (matrix) (Required if not providing cnv_data nor mut_data) Single cell targeted mutation matrix. Rows are single cell IDs, columns are mutations. Rows and columns must be named, column names in the format “<chromosome>:<coordinate>”. Note that the order of these rows and columns will not be preserved, unless mutation order is the same as that specified in the mut_order parameter. Also note that every single cell id must be present in the tree_edges data frame.
\(tree\_edges\) (data frame) Edges for the single cell phylogenetic tree. This data frame includes the following columns:
These parameters may be included if the data is time-series, and the user would like to view a TimeScape of the data alongside the CellScape:
\(gtype\_tree\_edges\) (data frame) Genotype tree edges of a rooted tree. This data frame includes the following columns:
\(sc\_annot\) (data frame) (Required for TimeScape) Annotations (genotype and sample id) for each single cell. This data frame includes the following columns:
If these two additional parameters are included, a TimeScape will be appended to the bottom of the view, like so:
The colours of each clone may be changed by the \(clone\_colours\) parameter. This data frame includes the following columns:
The order of targeted mutations in the heatmap may be specified by the \(mut\_order\) vector. Each element in the vector describes a mutation by its chromosome and coordinate, and is formatted as such: “<chromosome>:<coordinate>”.
Many titles throughout the view can be changed by the following parameters:
Interactive components:
To obtain single-cell targeted mutation data, extract reference and variant allele counts for target positions from your single-cell BAM files. Then compute variant allele frequencies (variant_count / (variant_count + reference_count)). Finally, import the data into R, and wrangle it into a data frame with single cell ID, chromosome, coordinate, and VAF.
To obtain single-cell copy number data, extract binned read counts from single-cell BAM files, apply GC and/or mappability correction to the raw counts, and infer copy number segments. Segments can be inferred using tools based on Hidden Markov Models (e.g. HMMcopy, http://bioconductor.org/packages/release/bioc/html/HMMcopy.html), or Circular Binary Segmentation (e.g. DNAcopy, https://bioconductor.org/packages/release/bioc/html/DNAcopy.html). For HMMcopy, the function HMMsegment provides the chromosome, start position, end position, copy number state, and median copy number value for all inferred segments. This can be wangled into a data frame with single cell ID, chromosome start, chromosome end, and copy number (either the integer copy number state, or the segment median). Recommended parameter settings for single-cell copy number inference with HMMcopy can be found here: http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4140.html.
E-scape takes as input a clonal phylogeny and clonal prevalences per
clone per sample. At the time of submission many methods have been
proposed for obtaining these values, and accurate estimation of these
quantities is the focus of ongoing research. We describe a method for
estimating clonal phylogenies and clonal prevalence using PyClone (Roth
et al., 2014; source code available at https://bitbucket.org/aroth85/pyclone/wiki/Home) and
citup (Malikic et al., 2016; source code available at https://github.com/sfu-compbio/citup). In brief, PyClone
inputs are prepared by processing fastq files resulting from a targeted
deep sequencing experiment. Using samtools mpileup (http://samtools.sourceforge.net/mpileup.shtml), the
number of nucleotides matching the reference and non-reference are
counted for each targeted SNV. Copy number is also required for each
SNV. We recommend inferring copy number from whole genome or whole exome
sequencing of samples taken from the same anatomic location / timepoint
as the samples to which targeted deep sequencing was applied. Copy
number can be inferred using Titan (Ha et al., 2014; source code
available at https://github.com/gavinha/TitanCNA). Sample specific
SNV information is compiled into a set of TSV files, one per sample. The
tables includes mutation id, reference and variant read counts, normal
copy number, and major and minor tumour copy number (see PyClone
readme). PyClone is run on these files using the
PyClone run_analysis_pipeline
subcommand, and produces the
tables/cluster.tsv
in the working directory. Citup can be
used to infer a clonal phylogeny and clone prevalences from the cellular
prevalences produced by PyClone. The tables/cluster.tsv
file contains per sample, per SNV cluster estimates of cellular
prevalence. The table is reshaped into a TSV file of cellular
prevalences with rows as clusters and columns as samples, and the
mean
of each cluster taken from
tables/cluster.tsv
for the values of the table. The
iterative version of citup is run on the table of cellular frequencies,
producing an hdf5 output results file. Within the hdf5 results, the
/results/optimal
can be used to identify the id of the
optimal tree solution. The clonal phylogeny as an adjacency list is then
the /trees/{tree_solution}/adjacency_list
entry and the
clone frequencies are the /trees/{tree_solution}/clone_freq
entry in the hdf5 file. The adjacency list can be written as a TSV with
the column names source
, target
to be input
into E-scape, and the clone frequencies should be reshaped such that
each row represents a clonal frequency in a specific sample for a
specific clone, with the columns representing the time or space ID, the
clone ID, and the clonal prevalence.
To view the documentation for CellScape, type the following command in R:
or:
CellScape was developed at the Shah Lab for Computational Cancer Biology at the BC Cancer Research Centre.
References:
Eirew, Peter, et al. “Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution.” Nature 518.7539 (2015): 422-426.
Ha, Gavin, et al. “TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data.” Genome research 24.11 (2014): 1881-1893.
Malikic, Salem, et al. “Clonality inference in multiple tumor samples using phylogeny.” Bioinformatics 31.9 (2015): 1349-1356.
Roth, Andrew, et al. “PyClone: statistical inference of clonal population structure in cancer.” Nature methods 11.4 (2014): 396-398.
Wang, Yong, et al. “Clonal evolution in breast cancer revealed by single nucleus genome sequencing.” Nature 512.7513 (2014): 155-160.