%\VignetteKeywords{runAbsoluteCN} %\VignetteEngine{knitr::knitr} %\VignetteDepends{PureCN} %\VignettePackage{PureCN} %\VignetteIndexEntry{Quick start and command line usage} \documentclass{article} <>= BiocStyle::latex2() @ \begin{document} <>= library(PureCN) set.seed(1234) @ \section*{PureCN - Quick Start} This tutorial provides a quick overview of the command line tools shipping with \Biocpkg{PureCN}. For the R package and more detailed information, see the main vignette. \subsection*{Prepare environment and files} \begin{itemize} \item Get the path to command line scripts in R: <>= system.file("extdata", package="PureCN") @ \item Store this path in an environment variable, for example in BASH: \begin{verbatim} $ export PURECN="/path/to/PureCN/extdata" $ Rscript $PURECN/PureCN.R --help Usage: /path/to/PureCN/inst/extdata/PureCN.R [-[-help|h]] ... \end{verbatim} \item Generate a basic interval file from a BED file containing target coordinates: \begin{verbatim} $ Rscript $PURECN/IntervalFile.R --infile baits_hg19.bed \ --fasta hg19.fa --outfile baits_hg19_gcgene.txt \end{verbatim} Internally, this script uses \Biocpkg{rtracklayer} to parse the \Rcode{infile}. Make sure that the file format matches the file extension. See the main vignette how to add gene symbols to the interval file. Symbols are necessary to obtain gene-level copy number and LOH calls. For a test run, you will not need this. \end{itemize} \subsection*{Run PureCN with third-party segmentation} If you already have a segmentation from third-party tools (for example CNVkit, EXCAVATOR2). For a test run: \begin{verbatim} Rscript $PURECN/PureCN.R --outdir $OUT/$SAMPLEID \ --sampleid $SAMPLEID \ --segfile $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg \ --vcf ${SAMPLEID}_mutect.vcf \ --genome hg19 --gcgene baits_hg19_gcgene.txt \end{verbatim} The main VCF (\Rcode{--vcf}) is ideally created by \software{MuTect} 1.1.7. Support for \software{MuTect 2} and \software{FreeBayes} is available, but poorly tested and only very limited artifact filtering will be performed for these callers. For a production pipeline run we provide a bit more information about the assay and genome: \begin{verbatim} Rscript $PURECN/PureCN.R --outdir $OUT/$SAMPLEID \ --sampleid $SAMPLEID \ --segfile $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg \ --normal_panel $NORMAL_PANEL \ --vcf ${SAMPLEID}_mutect.vcf \ --statsfile ${SAMPLEID}_mutect_stats.txt \ --snpblacklist hg19_simpleRepeats.bed \ --genome hg19 --gcgene baits_hg19_gcgene.txt \ # --funsegmentation none \ --force --postoptimize \end{verbatim} The normal panel VCF file is useful for mapping bias correction and especially recommended without matched normals. See the FAQ of the main vignette how to generate this file. It is not essential for test runs. The \software{MuTect} 1.1.7 stats file (the main output file besides the VCF) should be provided for better artifact filtering. The \Rcode{--funsegmentation} argument controls if the data should to be re-segmented using germline BAFs (default). Set this value to \Rcode{none} if the provided segmentation should be used as is. The \Rcode{--postoptimize} flag defines that purity should be optimized using both variant allelic fractions and copy number instead of copy number only. This results in a significant runtime increase for whole-exome data. \subsection*{Run PureCN with internal segmentation} The following describes \Biocpkg{PureCN} runs with internal copy number normalization and segmentation. Provided are again minimal examples for test runs. See the main vignette how to get optimal results in production pipelines. \subsubsection*{Coverage} For each sample, tumor and normal: \begin{verbatim} # From a BAM file $ Rscript $PURECN/Coverage.R --outdir $OUT/$SAMPLEID \ --bam ${SAMPLEID}.bam \ --gcgene baits_hg19_gcgene.txt # From a GATK DepthOfCoverage file Rscript $PURECN/Coverage.R --outdir $OUT/$SAMPLEID \ --gatkcoverage ${SAMPLEID}.coverage.sample_interval_summary \ --gcgene baits_hg19_gcgene.txt \end{verbatim} \subsubsection*{NormalDB} To build a normal database, copy all GC-normalized normal coverage files in a single text file, line-by-line: \begin{verbatim} ls -a normal*loess.txt | cat > example_normal.list # From already GC-normalized files $ Rscript $PURECN/NormalDB.R --outdir $OUT \ --coveragefiles example_normal.list \ --genome hg19 \end{verbatim} \subsubsection*{PureCN} \begin{verbatim} cd $OUT/$SAMPLEID # From GC-normalized coverage data $ Rscript $PURECN/PureCN.R --outdir . --tumor ${SAMPLEID}_coverage_loess.txt \ --normal ${SAMPLEID_NORMAL}_coverage_loess.txt \ --sampleid $SAMPLEID \ --vcf ${SAMPLEID}_mutect.vcf \ --genome hg19 \ --gcgene baits_hg19_gcgene.txt # Without a matched normal $ Rscript $PURECN/PureCN.R --outdir . --tumor ${SAMPLEID}_coverage_loess.txt \ --normaldb ../normalDB_hg19.rds \ --sampleid $SAMPLEID \ --vcf ${SAMPLEID}_mutect.vcf \ --pool 5 \ --genome hg19 --gcgene baits_hg19_gcgene.txt # Recreate output after manual curation of Sample_purecn.csv $ Rscript $PURECN/PureCN.R --rds ${SAMPLEID}_purecn.rds \end{verbatim} \subsection*{Session Info} <>= toLatex(sessionInfo()) @ \end{document}