%\VignetteEngine{knitr::knitr} \documentclass[a4paper,9pt]{article} <>= BiocStyle::latex() @ %\VignetteIndexEntry{VERSO} \usepackage[utf8]{inputenc} \usepackage{graphicx} \usepackage{placeins} \usepackage{url} \usepackage{tcolorbox} \usepackage{authblk} \begin{document} \title{Analysis of SARS-CoV-2 viral phylogenies with VERSO} \author[1]{Daniele Ramazzotti} \author[2]{Fabrizio Angaroni} \author[2,3]{Davide Maspero} \author[1]{Carlo Gambacorti-Passerini} \author[2]{Marco Antoniotti} \author[4]{Alex Graudenzi} \author[1]{Rocco Piazza} \affil[1]{Dept. of Medicine and Surgery, Univ. of Milan-Bicocca, Monza, Italy.} \affil[2]{Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.} \affil[3]{Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.} \affil[4]{Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy.} \date{\today} \maketitle \begin{tcolorbox}{\bf Overview.} VERSO (Viral Evolution ReconStructiOn) is an algorithmic framework that processes variants profiles from viral samples, to produce phylogenetic models of viral evolution from clonal variants and to subsequently quantify the intra-host genomic diversity of samples. VERSO includes two separate and subsequent steps; in this repository we provide an R implementation of VERSO STEP 1. \vspace{1.0cm} {\em In this vignette, we give an overview of the package by presenting its main functions.} \vspace{1.0cm} \renewcommand{\arraystretch}{1.5} \end{tcolorbox} <>= library(knitr) opts_chunk$set( concordance = TRUE, background = "#f3f3ff" ) @ \newpage \tableofcontents \section{Using the VERSO R package} We now present an example of phylogenetic analysis by VERSO using mutation data from a set of SARS-CoV-2 samples; the dataset includes variants for a selected set of 15 SARS-CoV-2 samples obtained by variant calling from raw data available from NCBI BioProject PRJNA610428. We first load the data. Notice that the input data to VERSO is an array reporting variants either observed (as 1 in the matrix), not observed (as 0) or missing (as NA, i.e., due to low coverage). <>= library("VERSO") data(variants) head(variants) @ We setup the main parameter in oder to perform the inference. The first main parameter to be defined as input is represented by the false positive and false negative error rates, i.e., alpha and beta. When multiple set of rates are provided, VERSO performs a grid search in order to estimate the best set of error rates. <>= alpha = c(0.01,0.05) beta = c(0.01,0.05) head(alpha) head(beta) @ We can now perform the inference as follow. Make sure to set the random seed to ensure reproducibility. <>= set.seed(12345) inference = VERSO(D = variants, alpha = alpha, beta = beta, check_indistinguishable = TRUE, num_rs = 5, num_iter = 100, n_try_bs = 50, num_processes = 1, verbose = TRUE) @ We notice that the inference resulting on the command above should be considered only as an example; the parameters num rs, num iter and n try bs representing the number of steps perfomed during the inference are downscaled to reduce execution time. We refer to the Manual for discussion on default values. We provide within the package results of the inference performed with the same parameters as RData. <>= data(inference) print(names(inference)) @ VERSO returns a list of 8 elements as results. Namely, B, C, phylogenetic tree, corrected genotypes, genotypes prevalence, genotypes summary, log likelihood and error rates. Here, B returns the maximum likelihood variants tree (inner nodes of the phylogenetic tree), C the attachment of patients to genotypes and phylogenetic tree VERSO phylogenetic tree, including both variants tree and patients attachments to variants; corrected genotypes is the corrected genotypes, which corrects D given VERSO phylogenetic tree, genotypes prevalence the number of patients and observed prevalence of each genotype and genotypes summary provide a summary of association of mutations to genotypes; finally log likelihood and error rates return the likelihood of the inferred phylogenetic moldel and best values of alpha and beta as estimated by VERSO. We can plot the inferred phylogetic tree using the function plot from the package ape. <>= plot(inference$phylogenetic_tree) @ \section{\Rcode{sessionInfo()}} <>= toLatex(sessionInfo()) @ \end{document}