--- title: "PathoStat Advanced Usage" author: "Sandro L. Valenzuela, Eduardo Castro-Nallar, Solaiappan Manimaran" date: "`r Sys.Date()`" package: "`r pkg_ver('PathoStat')`" vignette: > %\VignetteIndexEntry{PathoStatAdvanced} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document --- # Introduction Welcome to the PathoStat Advanced Usage vignette. PathoStat is a software package implemented in Shiny that lets you explore metagenomic data and perform statistical microbiome analysis with the confort of a graphical user interface (GUI). While most of the PathoStat functions are interactive, i.e., through the GUI, you still can use some of the functions for very practical things, notwithstanding, reading data from PathoScope2 results, BIOM-formatted files, and phyloseq objects. # Create Pathostat Object (pstat) from PathoScope2 output First, we choose the path where Pathoscope output files (.tsv) and sample information, i.e., metadata, are. In this case, we will be using the example data within the PathoStat package. ```{r example_data, eval=TRUE} library(PathoStat) example_data_dir <- system.file("example/data", package = "PathoStat") ``` Next, simply execute the `createPathoStat()` function in order to combine all .tsv files within `example_data_dir`. Remember we need a metadata file that is also tab-delimited (in this case `sample_data.tsv`) ```{r create_pathostat, eval=TRUE} pstat <- createPathoStat(input_dir=example_data_dir, sample_data_file="sample_data.tsv") ``` # Create Pathostat Object (pstat) from BIOM file PathoStat objects (pstat) are an extension of the phyloseq-class and can be created from biom files using the `import_biom` function from the `phyloseq` package. For example: ```{r create_pathostat_from_biom, eval=TRUE} library(phyloseq) rich_dense_biom = system.file("extdata", "rich_dense_otu_table.biom", package="phyloseq") phyob <- import_biom(rich_dense_biom) #and finally, we convert the phyloseq object into a pstat object pstat_biom <- pathostat(phyob) ``` # Saving and loading data, and running PathoStat pstat objects can be saved or loaded from disk, e.g., to be shared with collaborators. For this, we have created the `savePstat()` and `loadPstat()` functions. ```{r save_load_pathostat, eval=FALSE} # Saving data savePstat(pstat, outdir=".", outfileName="pstat_data.rda") # Loading data pstat <- loadPstat(indir=".", infileName="pstat_data.rda") # Calling the runPathoStat() function to execute Pathostat interactively runPathoStat(pstat) ``` # Example functions for Shiny developers: coreOTUModule and coreOTUModuleUI If you are creating a Shiny app, you could easily implement the coreOTU tab using the example code below ```{r coreOTU, eval=FALSE} #create a UI calling coreOTUModuleUI() function shinyUI(mainPanel( coreOTUModuleUI("coreOTUModule") )) #and a server shinyServer(function(input, output) { callModule( coreOTUModule, "coreOTUModule", pstat ) }) ``` # Estimating taxon abundances: findRAfromCount Once your data are in, you can easily estimate taxon abundances by using the `findRAfromCount()` function ```{r taxon_abundance, eval=TRUE} #first, get the otu_table from pstat calling a phyloseq function library(phyloseq) otut<-otu_table(pstat) ffc<-findRAfromCount(otut) #lets see, for example, the abundances for sample 01 on the ffc object head(ffc[,1], n = 15) ``` # Assigning lineages to taxonomy IDs: findTaxonomy and findTaxonMat We can fetch the lineages (and their Ids) by invoking the functions `findTaxonomy()` and `findTaxonMat()` ```{r taxon_matrix, eval=TRUE} dat <- ffc ids <- rownames(dat) tids <- unlist(lapply(ids, FUN = grepTid)) taxonLevels <- findTaxonomy(tids[1:4]) taxmat <- findTaxonMat(ids[1:4], taxonLevels) taxmat ``` # Obtaining a 95% confidence interval: plotConfRegion Many times researchers are interested in the accuracy of taxon abundance estimates. In this subtab, we provide a way to compare within-sample taxa in terms of their abundance estimate and 95% confidence interval. You can estimate the 95% confidence interval for a two-taxon within sample comparison by using the `plotConfRegion` ```{r plot_confidence_region, eval=TRUE} #select taxon 1 and 2, from your samples (randomly in this case) n<-nrow(as.matrix(rownames(otut))) m<-nrow(as.matrix(colnames(otut))) p1 <- otut[rownames(otut)[ sample(1:n, 1)], colnames(otut)[sample(1:m, 1)]] if (p1 <= 0) p1 <- 1 #random taxon for p2 in this case again n<-nrow(as.matrix(rownames(otut))) m<-nrow(as.matrix(colnames(otut))) p2 <- otut[rownames(otut)[ sample(1:n, 1)], colnames(otut)[sample(1:m, 1)]] if (p2 <= 0) p2 <- 1 size <- sum(otut[,colnames(otut)]) plotConfRegion(p1, p2, size, uselogit=FALSE) ``` # Obtaining counts per million: log2CPM log2cpm can compute log2(counts per million reads) and library size for each sample ```{r log2cpm, eval=TRUE} # Only to sample one: lcpm <- log2CPM(otut[,1]) lcpm ``` # Heatmap using pstat data Lets create a heatmap from our `pstat` object. We siply use the `phyloseq` function `plot_heatmap`. ```{r plot_heatmap, eval=TRUE,warning=FALSE} #select a tax level for the heatmap plot taxonLevel<-"class" physeq <- tax_glom(pstat,taxonLevel) plot_heatmap(physeq, taxa.label=taxonLevel) ```