--- title: "Vignette illustrating the usage of gscreend on simulated data" author: "Katharina Imkeller" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true vignette: > %\VignetteIndexEntry{Example_simulated} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction Pooled CRISPR perturbations screens employ a library of guide RNAs (gRNAs) that is transduced into a pool of cells with the aim to induce a single genetic perturbation in each cell. The perturbation effect is assessed by measuring the abundance of each gRNA after the screen selection phase and comparing it to its abundance in the plasmid library. The main goal of the following analysis is the detection of essential genes, i.e. genes whose knockout reduces the cell fitness. The package gscreend provides a method to rank genes based on count tables. # gscreend workflow In order to identify essential genes starting from raw gRNA count data, gscreend performs the following analysis steps: 1. Input of raw gRNA counts at T0 (sequencing of library) and T1 (at the end of the screen). Normalization and calculation of log fold changes. 2. Split log fold changes into intervals dependent on the initial count at T0. 3. For every interval fit a skew-normal distribution to the data to model the null hypothesis (via least quantile regression). 4. Based on the null model calculate p-values for every gRNA. 5. Rank gRNAs according to p-value and perform robust ranking aggregation to calculate p-values on gene level. 6. Perform quality control of data and statistical model. # Installation ```{r, eval = FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("gscreend") ``` ```{r, message=FALSE} library(gscreend) library(SummarizedExperiment) ``` # Analysis of simulated data with gscreend ## Input data: gRNA counts The simulated data used in this example has been generated using the simulation method available at https://github.com/imkeller/simulate_pooled_screen Raw count data consists of gRNA counts in the library sequencing and different replicates after the screen proliferation phase. In order to estimate the effect of a specific gRNA on cell fitness, the relative abundances of the gRNA before and after the proliferation phase will be compares ```{r} raw_counts <- read.table( system.file("extdata", "simulated_counts.txt", package = "gscreend"), header=TRUE) ``` Generate a summarized experiment from the count data. gscreend currently uses SummarizedExperiment objects as an input format. The count matrix contains raw gRNA counts. Each row represents one gRNA, each column represents one sample (T0, T1, replicates, ...). ```{r} counts_matrix <- cbind(raw_counts$library0, raw_counts$R0_0, raw_counts$R1_0) rowData <- data.frame(sgRNA_id = raw_counts$sgrna_id, gene = raw_counts$Gene) colData <- data.frame(samplename = c("library", "R1", "R2"), # timepoint naming convention: # T0 -> reference, # T1 -> after proliferation timepoint = c("T0", "T1", "T1")) se <- SummarizedExperiment(assays=list(counts=counts_matrix), rowData=rowData, colData=colData) ``` ## Run gscreend In this step a gscreend experiment object is generated that will after the analysis contain all data related to gRNAs, genes and model parameters. ```{r} pse <- createPoolScreenExp(se) ``` Run gscreend with default parameters. ```{r} pse_an <- RunGscreend(pse) ``` ## Quality control gscreend provides basic quality control functions for inspection of replicate correlation for example. ```{r, fig.width=4, fig.height=4.5} plotReplicateCorrelation(pse_an) ``` The ``plotModelParameters()`` function can be used to inspect the values of the parameters estimated for the skew normal distribution of the logarithmic fold change data. ```{r} plotModelParameters(pse_an) ``` # Results The ResultsTable function can be used to extract a table listing for each gene the p-value and fdr. These values correspond to the results from the statistical test indicting whether upon perturbation a specific gene reduces (direction = "negative"), or increasing (direction = "positive") cell viability. ```{r} res <- ResultsTable(pse_an, direction = "negative") head(res) ``` # Session Info ```{r} sessionInfo() ```