--- title: "ChIPSeqSpike: ChIP-seq data scaling with spike-in control" author: - name: Nicolas Descostes affiliation: Howard Hughes Medical Institute - New York University email: nicolas.descostes@gmail.com package: ChIPSeqSpike abstract: Vignette update - 2018-01-26 output: BiocStyle::pdf_document vignette: > %\VignetteIndexEntry{ChIPSeqSpike} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction Chromatin Immuno-Precipitation followed by Sequencing (ChIP-Seq) is used to determine the binding sites of any protein of interest, such as transcription factors or histones with or without a specific modification, at a genome scale [@barski2007; @park2009]. ChIP-Seq entails treating cells with a cross-linking reagent such as formaldehyde; isolating the chromatin and fragmenting it by sonication; immuno-precipitating with antibodies directed against the protein of interest; reversing crosslink; DNA purification and amplification before submission to sequencing. These many steps can introduce biases that make ChIP-Seq more qualitative than quantitative. Different efficiencies in nuclear extraction, DNA sonication, DNA amplification or antibody recognition make it challenging to distinguish between true differential binding events and technical variability. This problem was addressed by using an external spike-in control to keep track of technical biases between conditions [@orlando2014; @bonhoure2014]. Exogenous DNA from a different non-closely related species is inserted during the protocol to infer scaling factors. This modification was shown to be especially important for revealing global histone modification differences, that are not caught by traditional downstream data normalization techniques, such as Histone H3 lysine-27 trimethyl (H3K27me3) upon Ezh2 inhibitor treatment [@trojer2016]. ChIPSeqSpike provides tools for ChIP-Seq spike-in normalization, assessment and analysis. Conversely to a majority of ChIP-Seq related tools, ChIPSeqSpike does not rely on peak detection. However, if one wants to focus on peaks, ChIPSeqSpike is flexible enough to do so. ChIPSeqSpike provides ready to use scaled bigwig files as output and scaling factors values. We believe that ChIPSeqSpike will be of great value to the genomics community. # Standard workflow ## Note for Windows users On Windows operating system, reading of BigWig files fail due to the Bioconductor package [rtracklayer >= 1.37.6](https://bioconductor.org/packages/release/bioc/html/rtracklayer.html) that does not support this file format. Therefore, the boost mode, the input subtraction method, and scaling method do not work on this operating system. ## Quick start A case study reported Chromatin Immuno-Precipitation followed by Sequencing (ChIP-Seq) experiments that did not show differences upon inhibitor treatment with traditional normalization procedures [@orlando2014]. These experiments looked at the effect of DOT1L inhibitor EPZ5676 [@daigle2013] treatment on the Histone H3 lysine-79 dimethyl (H3K79me2) modification in Jurkat cells. DOT1L is involved in the RNA Polymerase II pause release and licensing of transcriptional elongation. H3K79me2 ChIP-Seq were performed on cells treated with 0%, 50% and 100% EPZ5676 inhibitor (see next section for details on data). The following code performs spike-in normalization with a wrapper function on data sub-samples. A 'test_chipseq' temporary results folder is also created. \newline ```{r message = FALSE, warning = FALSE} library("ChIPSeqSpike") ## Preparing testing data info_file_csv <- system.file("extdata/info.csv", package="ChIPSeqSpike") bam_path <- system.file("extdata/bam_files", package="ChIPSeqSpike") bigwig_path <- system.file("extdata/bigwig_files", package="ChIPSeqSpike") gff_vec <- system.file("extdata/test_coord.gff", package="ChIPSeqSpike") genome_name <- "hg19" output_folder <- "test_chipseqspike" bigwig_files <- system.file("extdata/bigwig_files", c("H3K79me2_0-filtered.bw", "H3K79me2_100-filtered.bw", "H3K79me2_50-filtered.bw", "input_0-filtered.bw", "input_100-filtered.bw", "input_50-filtered.bw"), package="ChIPSeqSpike") ## Copying example files dir.create("./test_chipseqspike") mock <- file.copy(bigwig_files, "test_chipseqspike") ## Performing spike-in normalization if (.Platform$OS.type != "windows") { csds_test <- spikePipe(info_file_csv, bam_path, bigwig_path, gff_vec, genome_name, outputFolder = output_folder) } ``` ## Data The data used in this documentation represent a gold-standard example of the importance of using spike-in controls with ChIP-Seq experiments. It uses chromatin from Drosophila Melanogaster as exogenous spike-in control to correct experimental biases. Without a spike-in control and using only RPM normalization, proper differences in the H3K79me2 histone modification in human Jurkat cells upon EPZ5676 inhibitor treatment were not observed [@orlando2014]. This dataset is made of bigwig and bam files of H3K79me2 ChIP-Seq data and corresponding input DNA controls (see [input subtraction section](#inputSubtraction)). Bam files contain data aligned to the Human reference genome Hg19 or to the Drosophila reference genome dm3. The latest is used to compute external spike-in scaling factors. All above mentioned data are available at 0%, 50% and 100% EPZ5676 inhibitor treatment. ### Testing data For the sake of memory and computation time efficiency, bigwig files used in this vignette are limited to chromosome 1. Reads falling in the top 10% mostly bound genes (at 0% treatment) with length between 700-800 bp were kept in bam files. For efficient plotting functions testing, only binding values of the top 100 mostly bound genes are used and can be accessed with `data(result_extractBinding)`. Scores for factors and read counts were computed on the whole dataset and are available through `data(result_estimateScalingFactors)` (see below). ### Complete data The whole dataset is accessible at [GSE60104](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60104). Specifically, the data used are H3K79me2 0% (GSM1465004), H3K79me2 50% (GSM1465006), H3K79me2 100% (GSM1465008), input DNA 0% (GSM1511465), input DNA 50% (GSM1511467) and input DNA 100% (GSM1511469). The data were treated as follows: Quality of sequencing was assessed with [FastQC v0.11.4](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads having less than 80% of quality scores above 25 were removed with NGSQCToolkit v2.3.3 [@ngsqctoolkit]. Homo Sapiens Hg19 and Drosophila Melanogaster dm3 from Illumina igenomes UCSC Collection were used. ChIP-Seq data were aligned with Bowtie2 v2.1.0 [@bowtie2] with default parameters. Sam outputs were converted to Bam with Samtools v1.0.6 [@samtools] and sorted with [Picard tools v1.88](http://broadinstitute.github.io/picard). Data were further processed with Pasha v0.99.21 [@pasha]. Fixed steps wiggle files were converted to bigwigs with [wigToBigWig](http://hgdownload.soe.ucsc.edu/admin/exe/). Results with the complete dataset are also provided in this documentation. ## Detailed operations The spike-in normalization procedure consists of 4 steps: RPM scaling, input DNA subtraction, RPM scaling reversal and exogenous spike-in DNA scaling. Below is detailed the different steps included in the above mentioned wrapper function 'spikePipe'. ### Dataset generation The different data necessary for proper spike-in scaling are provided in a csv or a tab separated txt file. The columns must contain proper names and are organized as follows: Experiment name (expName); bam file name of data aligned to the endogenous reference genome (endogenousBam); bam file name of data aligned to the exogenous reference genome (exogenousBam); the corresponding input DNA bam file aligned to the endogenous reference genome (inputBam); the fixed steps bigwig file name of data aligned to the endogenous reference genome (bigWigEndogenous) and the fixed steps bigwig file names of the corresponding input DNA experiment aligned to the endogenous reference genome (bigWigInput). \newline ```{r message = FALSE, warning = FALSE} info_file <- read.csv(info_file_csv) head(info_file) ```` From the info file, two kinds of objects can be generated: either a ChIPSeqSpikeDataset or a ChIPSeqSpikeDatasetList depending upon the number of input DNA experiments. A ChIPSeqSpikeDatasetList object is a list of ChIPSeqSpikeDataset object that is created if several input DNA experiments are used. In this latter case, ChIP-Seq experiments are grouped by their corresponding input DNA. The function spikeDataset creates automatically the suitable object. The folder path to the bam and fixed steps bigwig files must be provided. \newline ```{r message = FALSE, warning = FALSE} csds_test <- spikeDataset(info_file_csv, bam_path, bigwig_path) is(csds_test) ```` #### Boost mode Reading and processing bigwig and bam files can be memory-greedy. By default, files are read, processed and written at each step of the normalization procedure to enable data treatment on a regular desktop computer. One could wish to reduce the time of computation especially when processing a lot of data. ChIPSeqSpike reduces such time by providing a boost mode. \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { csds_testBoost <- spikeDataset(info_file_csv, bam_path, bigwig_path, boost = TRUE) is(csds_testBoost) } ```` Binding scores for each experiment are stored in a GRanges object and are directly accessible by functions. \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { getLoadedData(csds_testBoost[[1]]) } ```` Even if optimizing greatly the time of computing, one should know that loading binding scores of all experiments is greedy in memory and should be used with caution. The boost mode is ignored in the rest of the vignette, but all code provided in the following sections, with the exception of section 3.1 ( plotTransform), can be run with csds_testBoost. ### Summary and control A ChIPSeqSpikeDataset object, at this point, is made of slots storing paths to files. In order to compute scaling factors, bam counts are first computed. A scaling factor is defined as 1000000/bam_count. The method estimateScalingFactors returns bam counts and endogenous/exogenous scaling factors for all experiments. In the following example, scores are computed using chromosome 1 only. Scores on the whole dataset are also indicated below. \newline ```{r message = FALSE, warning = FALSE} csds_test <- estimateScalingFactors(csds_test, verbose = FALSE) ```` The different scores can be visualized: \newline ```{r message = FALSE, warning = FALSE} ## Scores on testing sub-samples spikeSummary(csds_test) ##Scores on whole dataset data(result_estimateScalingFactors) spikeSummary(csds) ```` An important parameter to keep in mind when performing spike-in with ChIP-seq is the percentage of exogenous DNA relative to that of endogenous DNA. The amount of exogenous DNA should be between 2-25% of endogenous DNA. The method getRatio returns the percentage of exogenous DNA and throws a warning if this percentage is not within the 2-25% range. In theory, having more than 25% exogenous DNA should not affect the normalization, whereas having less than 2% is usually not sufficient to perform a reliable normalization. \newline ```{r message = FALSE} getRatio(csds_test) ## Result on the whole dataset data(ratio) ratio ```` ### RPM scaling The first normalization applied to the data is the 'Reads Per Million' (RPM) mapped reads. The method 'scaling' is used to achieve such normalization using default parameters. It is also used to reverse the RPM normalization and apply exogenous scaling factors (see sections [2.3.5](#RPMreversal) and [2.3.6](#exoscaling)). \newline \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { csds_test <- scaling(csds_test, outputFolder = output_folder) } ```` In the context of this vignette, output_folder is precised since the testing files were copied to a temporary 'test_chipseqspike' folder. The slots containing paths to files will be updated to this folder. If not precised, the RPM scaled bigwig files are written to the same folder containing bigwigs. This statement is also applicable for the next operations below. ### Input Subtraction {#inputSubtraction} When Immuno-Precipitating (IP) DNA bound by a given protein, a control is needed to distinguish background noise from true signal. This is typically achieved by performing a mock IP, omitting the use of antibody. After mock IP sequencing, one can notice peaks of signal above background. These peaks have to be removed from the experiment since they represent false positives. The inputSubtraction method simply subtracts scores of the input DNA experiment from the corresponding ones. If in boost mode, the input subtracted values are stored in the dataset object and no files are written. For this latter case, the method exportBigWigs can be used to output the transformed files. \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { csds_test <- inputSubtraction(csds_test) } ```` ### RPM scaling reversal {#RPMreversal} After RPM and input subtraction normalization, the RPM normalization is reversed in order for the data to be normalized by the exogenous scaling factors. \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { csds_test <- scaling(csds_test, reverse = TRUE) } ```` \newpage ### Exogenous scaling {#exoscaling} Finally, exogenous scaling factors are applied to the data. \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { csds_test <- scaling(csds_test, type = "exo") } ```` ### Extract binding values The last step of data processing is to extract and format binding scores in order to use plotting methods. The 'extractBinding' method extracts binding scores at different locations and stores these values in the form of PlotSetArray objects and matrices (see ?extractBinding for more details). The scores are retrieved on annotations provided in a gff file. If one wishes to focus on peaks, their coordinates should be submitted at this step. The genome name must also be provided. For details about installing the required BSgenome package corresponding to the endogenous organism, see the [BSgenome](https://bioconductor.org/packages/release/bioc/html/BSgenome.html) package documentation. \newline ```{r message = FALSE, warning = FALSE} if (.Platform$OS.type != "windows") { csds_test <- extractBinding(csds_test, gff_vec, genome_name) } ```` # Plotting data ChIPSeqSpike offers several graphical methods for normalization diagnosis and data exploration. These choices enable one to visualize each step of the normalization through exploring inter-samples differences using profiles, heatmaps, boxplots and correlation plots. In the following sections, the testing data are restricted to the 100 mostly bound genes. Results on the complete set of hg19 genes are also indicated. ## Meta-profiles and transformations The first step of spike-in normalized ChIP-Seq data analysis is an inter-sample comparison by meta-gene or meta-annotation profiling. The method 'plotProfile' automatically plots all experiments at the start, midpoint, end and composite locations of the annotations provided to the method extractBinding in gff format. Here is the result of profiling H3K79me2 on the 100 mostly bound genes at 0% inhibitor treatment (figure \ref{figure1}). \newline ```{r message = FALSE, warning = FALSE, fig.cap="Spiked experiment upon different percentages concentrations of inhibitor treatment \\label{figure1}", fig.height = 6} data(result_extractBinding) plotProfile(csds, legend = TRUE) ```` The unspiked data (however RPM scaled and input subtracted) can be added to the plot (figure \ref{figure2}). \newline ```{r message = FALSE, warning = FALSE, fig.cap="Same as figure 1 including unspiked data \\label{figure2}", fig.height = 7} plotProfile(csds, legend = TRUE, notScaled = TRUE) ```` \newpage The effect of the individual processing steps for each experiment can also be plotted. \newline ```{r message = FALSE, warning = FALSE} plotTransform(csds, legend = TRUE, separateWindows = TRUE) ```` ## Heatmaps plotHeatmaps is a versatile method based on the plotHeatmap method of the seqplots package [@seqplots]. This method enables one to represent data at different locations (start, end, midpoint, composite) and at different stages of the normalization process. Different scaling (log, zscore, etc) and different clustering approaches (k-means, hierarchical, etc) can be used (see documentation for more details). Figure \ref{figure3} shows a k-means clustering of spiked data, each group being sub-sorted by decreasing values. \newline \newline ```{r message = FALSE, warning = FALSE, fig.cap="kmeans clustering of spiked data \\label{figure3}", fig.height = 5} plotHeatmaps(csds, nb_of_groups = 2, clustering_method = "kmeans") ```` \newpage Figure \ref{figure4} illustrates a clustering by decreasing values on the whole dataset. ![Spiked data organized by decreasing values at start position of all refseq Hg19 genes \label{figure4}](heatmaps_start-2.png) ## Boxplots boxplotSpike plots boxplots of the mean values of ChIP-seq experiments on the annotations given to the extractBinding method. It offers a wide range of graphical representations that includes violin plots (see documentation for details). Figure \ref{figure5} illustrates all transformations of all dataset indicating confidence intervals. \newpage ```{r message = FALSE, warning = FALSE, fig.cap="Complete representation of the whole procedure using boxplots (without outliers)\\label{figure5}", fig.height = 6} par(cex.axis=0.5) boxplotSpike(csds, rawFile = TRUE, rpmFile = TRUE, bgsubFile = TRUE, revFile = TRUE, spiked = TRUE, outline = FALSE) ```` \newpage Figure \ref{figure6} shows spiked experiments indicating each mean value, mean and standard deviation with a violin plot representation. \newline ```{r message = FALSE, warning = FALSE, fig.cap="Spiked data with mean and standard deviation - Each point represents a mean binding value on a given gene \\label{figure6}", fig.height = 6} boxplotSpike(csds, outline = FALSE, violin=TRUE, mean_with_sd = TRUE, jitter = TRUE) ```` ## Correlation plots The plotCor method plots the correlation between ChIP-seq experiments using heatscatter plot or, if heatscatterplot = FALSE, correlation tables. For heatscatter plots, ChIPSeqSpike makes use of the heatscatter function of the package [LSD](https://CRAN.R-project.org/package=LSD) and the corrplot function of the package [corrplot](https://CRAN.R-project.org/package=corrplot) is used to generate correlation tables. This offers a wide range of graphical possibilities for assessing the correlation between experiments and transformation steps (see documentation for more details). Figure \ref{figure7} shows two correlation table representations between spiked experiments. \newline ```{r message = FALSE, warning = FALSE, fig.cap="Correlation table of spiked data with circle (left) or numbers (right)\\label{figure7}", fig.height = 6, fig.width=10} par(mfrow=c(1,2)) plotCor(csds, heatscatterplot = FALSE) plotCor(csds, heatscatterplot = FALSE, method_corrplot = "number") ```` Figure \ref{figure8} illustrates a heatscatter plot of spiked data after log transformation (only positive mean binding values are kept) and figure \ref{figure9} is the result of running the same code on the whole refseq Hg19 gene set. \newline ```{r message = FALSE, warning = FALSE, fig.cap="Heatscatter of spiked data after log transformation \\label{figure8}", fig.height=6} plotCor(csds, method_scale = "log") ```` ![Heatscatter of spiked data after log transformation on all Hg19 refseq genes \label{figure9}](cor_log.pdf) ```{r message = FALSE, warning = FALSE, include = FALSE} unlink("test_chipseqspike/", recursive = TRUE) ```` # Session info ```{r} sessionInfo(package="ChIPSeqSpike") ```` # References --- references: - id: barski2007 title: 'High-Resolution Profiling of Histone Methylations in the Human Genome' author: - family: Barski given: Artem - family: Cuddapah given: Suresh - family: Cui given: Kairong - family: Roh given: Tae Young - family: Schones given: Dustin E - family: Wang given: Zhibin - family: Wei given: Gang - family: Chepelev given: Iouri - family: Zhao given: Keji container-title: Cell volume: 129 URL: 'http://dx.doi.org/10.1016/10.1016/j.cell.2007.05.009' DOI: 10.1016/j.cell.2007.05.009 issue: 4 publisher: Cell press page: 823-837 type: article-journal issued: year: 2007 month: 5 - id: park2009 title: 'ChIP–seq: advantages and challenges of a maturing technology' author: - family: Park given: PJ container-title: Nature Reviews Genetics volume: 10 URL: 'http://dx.doi.org/10.1038/nrg2641' DOI: 10.1038/nrg2641 issue: publisher: Nature Publishing Group page: 669-680 type: article-journal issued: year: 2009 month: 10 - id: bonhoure2014 title: 'Quantifying ChIP-seq data: a spiking method providing an internal reference for sample-to-sample normalization' author: - family: Bonhoure given: Nicolas - family: Bounova given: Gergana - family: Bernasconi given: David - family: Praz given: Viviane - family: Lammers given: Fabienne - family: Canella given: Donatella - family: Willis given: Ian M - family: Herr given: Winship - family: Hernandez given: Nouria - family: Delorenzi given: Mauro - family: Consortium given: CycliX container-title: Genome Research volume: 24 URL: 'http://dx.doi.org/10.1101/gr.168260.113' DOI: 10.1101/gr.168260.113 issue: publisher: Cold Spring Harbor Press page: 1157-1168 type: article-journal issued: year: 2014 month: 04 - id: orlando2014 title: 'Quantitative ChIP-seq Normalization reveals global modulation of the Epigenome' author: - family: Orlando given: David A - family: Chen given: Mei Wei - family: Brown given: Victoria E - family: Solanki given: Snehakumari - family: Choi given: Yoon J - family: Olson given: Eric R - family: Fritz given: Christian C - family: Bradner given: James E - family: Guenther given: Matthew G container-title: Cell Reports volume: 9 URL: 'http://dx.doi.org/10.1016/j.celrep.2014.10.018' DOI: 10.1016/j.celrep.2014.10.018 issue: 3 publisher: Cell press page: 1163–1170 type: article-journal issued: year: 2014 month: 11 - id: trojer2016 title: 'An Alternative Approach to ChIP-Seq Normalization Enables Detection of Genome-Wide Changes in Histone H3 Lysine 27 Trimethylation upon EZH2 Inhibition' author: - family: Egan given: Brian - family: Yuan given: Chih Chi - family: Craske given: Madeleine Lisa - family: Labhart given: Paul - family: Guler given: Gulfem D - family: Arnott given: David - family: Maile given: Tobias M - family: Busby given: Jennifer - family: Henry given: Chisato - family: Kelly given: Theresa K - family: Tindell given: Charles A - family: Jhunjhunwala given: Suchit - family: Zhao given: Feng - family: Hatton given: Charlie - family: Bryant given: Barbara M - family: Classon given: Marie - family: Trojer given: Patrick container-title: PLoS ONE volume: 11 URL: 'http://dx.doi.org/10.1371/journal.pone.0166438' DOI: 10.1371/journal.pone.0166438 issue: 11 publisher: PLoS type: article-journal issued: year: 2016 month: 11 - id: daigle2013 title: 'Potent inhibition of DOT1L as treatment of MLL-fusion leukemia' author: - family: Daigle given: SR - family: Olhava given: EJ - family: Therkelsen given: CA - family: Basavapathruni given: A - family: Jin given: L - family: Boriack-Sjodin given: PA - family: Allain given: CJ - family: Klaus given: CR - family: Raimondi given: A - family: Scott given: MP - family: Waters given: NJ - family: Cheswort given: R - family: Moyer given: MP - family: Copeland given: RA - family: Richon given: VM - family: Pollock given: RM container-title: Blood volume: 122 URL: 'http://dx.doi.org/10.1182/blood-2013-04-497644' DOI: 10.1182/blood-2013-04-497644 issue: 6 publisher: American Society of Hematology page: 1017-1025 type: article-journal issued: year: 2013 month: 8 - id: ngsqctoolkit title: 'NGS QC toolkit: A toolkit for quality control of next generation sequencing data' author: - family: Patel given: Ravi K - family: Jain given: Mukesh container-title: PLoS ONE volume: 7 URL: 'http://dx.doi.org/10.1371/journal.pone.0030619' DOI: 10.1371/journal.pone.0030619 issue: 2 publisher: PLoS type: article-journal issued: year: 2012 month: 2 - id: bowtie2 title: 'Fast gapped-read alignment with Bowtie 2' author: - family: Langmead given: Ben - family: Salzberg given: Steven L container-title: Nature Method volume: 9 URL: 'http://dx.doi.org/10.1038/nmeth.1923' DOI: 10.1038/nmeth.1923. issue: 4 publisher: Nature Publishing Group type: article-journal issued: year: 2012 month: 3 - id: samtools title: 'The Sequence Alignment/Map format and SAMtools' author: - family: Li given: Heng - family: Handsaker given: Bob - family: Wysoker given: Alec - family: Fennell given: Tim - family: Ruan given: Jue - family: Homer given: Nils - family: Marth given: Gabor - family: Abecasis given: Goncalo - family: Durbin given: Richard container-title: Bioinformatics volume: 25 URL: 'http://dx.doi.org/10.1093/bioinformatics/btp352' DOI: 10.1093/bioinformatics/btp352 issue: 16 publisher: Oxford Academic type: article-journal issued: year: 2009 month: 8 - id: pasha title: 'Pasha a versatile R package for piling chromatin HTS data' author: - family: Fenouil given: Romain - family: Descostes given: Nicolas - family: Spinelli given: Lionel - family: Koch given: Frederic - family: Maqbool given: Muhammad A - family: Benoukraf given: Touati - family: Cauchy given: Pierre - family: Innocenti given: Charlene - family: Ferrier given: Pierre - family: Andrau given: Jean-Christophe container-title: Bioinformatics volume: 25 URL: 'http://dx.doi.org/10.1093/bioinformatics/btp352' DOI: 10.1093/bioinformatics/btp352 issue: 16 publisher: Oxford Academic type: article-journal issued: year: 2009 month: 8 - id: seqplots title: 'SeqPlots - Interactive software for exploratory data analyses, pattern discovery and visualization in genomics' author: - family: Stempor given: Przemyslaw - family: Ahringer given: Julie container-title: Wellcome Open Research volume: 1 URL: 'http://dx.doi.org/10.12688/wellcomeopenres.10004.1' DOI: 10.12688/wellcomeopenres.10004.1 issue: publisher: type: article-journal issued: year: 2016 month: 11 ---