alevinQC 1.0.0
The purpose of the alevinQC package is to generate a summary QC report based on the output of an alevin (Srivastava et al. 2018) run. The QC report can be generated as a html or pdf file, or launched as a shiny application.
alevinQC
can be installed using the BiocManager
CRAN package.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("alevinQC")
After installation, load the package into the R session.
library(alevinQC)
For more information about running alevin, we refer to the
documentation. When
invoked, alevin generates several output files in the specified output
directory. alevinQC assumes that this structure is retained, and
will return an error if it isn’t - thus, it is not recommended to move or
rename the output files from alevin. alevinQC assumes that the
following files (in the indicated structure) are available in the provided
baseDir
(note that currently, in order to generate the full set of files,
alevin must be invoked with the --dumpFeatures
flag).
baseDir
|--alevin
| |--featureDump.txt
| |--filtered_cb_frequency.txt
| |--MappedUmi.txt
| |--quants_mat_cols.txt
| |--quants_mat_rows.txt
| |--quants_mat.gz
| |--raw_cb_frequency.txt
| |--whitelist.txt
|--aux_info
| |--meta_info.json
|--cmd_info.json
The report generation functions (see below) will check that all the required
files are available in the provided base directory. However, you can also call
the function checkAlevinInputFiles()
to run the check manually. If one or more
files are missing, the function will raise an error indicating the missing
file(s).
baseDir <- system.file("extdata/alevin_example", package = "alevinQC")
checkAlevinInputFiles(baseDir = baseDir)
The alevinQCReport()
function generates the QC report from the alevin output.
Depending on the file extension of the outputFile
argument, and the value of
outputFormat
, the function can generate either an html report or a pdf report.
outputDir <- tempdir()
alevinQCReport(baseDir = baseDir, sampleId = "testSample",
outputFile = "alevinReport.html",
outputFormat = "html_document",
outputDir = outputDir, forceOverwrite = TRUE)
In addition to static reports, alevinQC can also generate a shiny application, containing the same summary figures as the pdf and html reports.
app <- alevinQCShiny(baseDir = baseDir, sampleId = "testSample")
Once created, the app can be launched using the runApp()
function from the
shiny package.
shiny::runApp(app)
The individual plots included in the QC reports can also be independently generated. To do so, we must first read the alevin output into an R object.
alevin <- readAlevinQC(baseDir = baseDir)
#> reading in alevin gene-level counts across cells
#> Joining, by = "CB"
The resulting list contains three entries:
cbTable
: a data.frame
with various inferred characteristics of the
individual cell barcodessummaryTables
: a list of data.frame
s with summary information about the
full data set, the initial set of whitelisted cells and the final set of
whitelisted cells, respectivelyversionTable
: a matrix
with information about the invokation of alevinhead(alevin$cbTable)
#> CB originalFreq ranking collapsedFreq mappingRate
#> 1 GACTGCGAGGGCATGT 121577 1 123419 0.853256
#> 2 GGTGCGTAGGCTACGA 110467 2 111987 0.844339
#> 3 ATGAGGGAGTAGTGCG 106446 3 108173 0.826177
#> 4 ACTGTCCTCATGCTCC 104794 4 106085 0.778442
#> 5 CGAACATTCTGATACG 104616 5 106072 0.802634
#> 6 ACTGTCCCATATGGTC 99208 6 100776 0.811999
#> duplicationRate dedupRate nbrGenesAboveMean nbrMappedUMI totalUMICount
#> 1 0.000510955 0.293416 7345 105308 74409
#> 2 0.000541694 0.292190 7306 94555 66927
#> 3 0.000541090 0.294305 6876 89370 63068
#> 4 0.000393819 0.299899 6733 82581 57815
#> 5 0.000501289 0.303393 7142 85137 59307
#> 6 0.000597173 0.300086 6637 81830 57274
#> nbrGenesAboveZero inFinalWhiteList inFirstWhiteList
#> 1 7532 TRUE TRUE
#> 2 7520 TRUE TRUE
#> 3 7078 TRUE TRUE
#> 4 6925 TRUE TRUE
#> 5 7344 TRUE TRUE
#> 6 6831 TRUE TRUE
knitr::kable(alevin$summaryTables$fullDataset)
Total number of processed reads | 7197662 |
Number of reads with valid cell barcode (no Ns) | 7162300 |
Total number of observed cell barcodes | 188613 |
knitr::kable(alevin$summaryTables$initialWhitelist)
Number of barcodes in initial whitelist | 299 |
Fraction reads in initial whitelist barcodes | 87.41% |
Mean number of reads per cell (initial whitelist) | 20939 |
Median number of reads per cell (initial whitelist) | 342 |
Median number of detected genes per cell (initial whitelist) | 205 |
Total number of detected genes (initial whitelist) | 31396 |
Median UMI count per cell (initial whitelist) | 212 |
knitr::kable(alevin$summaryTables$finalWhitelist)
Number of barcodes in final whitelist | 98 |
Fraction reads in final whitelist barcodes | 83.8% |
Mean number of reads per cell (final whitelist) | 61242 |
Median number of reads per cell (final whitelist) | 58349 |
Median number of detected genes per cell (final whitelist) | 5269 |
Total number of detected genes (final whitelist) | 31050 |
Median UMI count per cell (final whitelist) | 31939 |
knitr::kable(alevin$versionTable)
Start time | Tue Nov 20 15:43:04 2018 |
Salmon version | 0.11.4 |
Index | /mnt/scratch5/avi/alevin/data/mohu/salmon_index/ |
R1file | /mnt/scratch5/avi/alevin/data/10x/mohu/100/all_bcs.fq |
R2file | /mnt/scratch5/avi/alevin/data/10x/mohu/100/all_reads.fq |
tgMap | /mnt/scratch5/avi/alevin/data/mohu/gtf/txp2gene.tsv |
The plots can now be generated using the dedicated plotting functions provided with alevinQC (see the help file for the respective function for more information).
plotAlevinKneeRaw(alevin$cbTable)
plotAlevinBarcodeCollapse(alevin$cbTable)
plotAlevinQuant(alevin$cbTable)
plotAlevinKneeNbrGenes(alevin$cbTable)
sessionInfo()
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] alevinQC_1.0.0 BiocStyle_2.12.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.1 highr_0.8 later_0.8.0
#> [4] pillar_1.3.1 compiler_3.6.0 BiocManager_1.30.4
#> [7] RColorBrewer_1.1-2 plyr_1.8.4 tools_3.6.0
#> [10] digest_0.6.18 evaluate_0.13 tibble_2.1.1
#> [13] gtable_0.3.0 pkgconfig_2.0.2 rlang_0.3.4
#> [16] shiny_1.3.2 GGally_1.4.0 crosstalk_1.0.0
#> [19] yaml_2.2.0 xfun_0.6 dplyr_0.8.0.1
#> [22] stringr_1.4.0 knitr_1.22 htmlwidgets_1.3
#> [25] shinydashboard_0.7.1 cowplot_0.9.4 DT_0.5
#> [28] grid_3.6.0 tidyselect_0.2.5 reshape_0.8.8
#> [31] glue_1.3.1 R6_2.4.0 rmarkdown_1.12
#> [34] bookdown_0.9 purrr_0.3.2 ggplot2_3.1.1
#> [37] magrittr_1.5 promises_1.0.1 scales_1.0.0
#> [40] htmltools_0.3.6 tximport_1.12.0 assertthat_0.2.1
#> [43] xtable_1.8-4 mime_0.6 colorspace_1.4-1
#> [46] httpuv_1.5.1 labeling_0.3 stringi_1.4.3
#> [49] lazyeval_0.2.2 munsell_0.5.0 rjson_0.2.20
#> [52] crayon_1.3.4
Srivastava, Avi, Laraib Malik, Tom Sean Smith, Ian Sudbery, and Rob Patro. 2018. “Alevin Efficiently Estimates Accurate Gene Abundances from dscRNA-seq Data.” bioRxiv Doi:10.1101/335000.