\docType{methods} \name{UniFrac} \alias{UniFrac} \alias{UniFrac,phyloseq-method} \title{Calculate weighted or unweighted (Fast) UniFrac distance for all sample pairs.} \usage{ UniFrac(physeq, weighted=FALSE, normalized=TRUE, parallel=FALSE, fast=TRUE) } \arguments{ \item{physeq}{(Required). \code{\link{phyloseq-class}}, containing at minimum a phylogenetic tree (\code{\link{phylo-class}}) and contingency table (\code{\link{otuTable-class}}). See examples below for coercions that might be necessary.} \item{weighted}{(Optional). Logical. Should use weighted-UniFrac calculation? Weighted-UniFrac takes into account the relative abundance of species/taxa shared between samples, whereas unweighted-UniFrac only considers presence/absence. Default is \code{FALSE}, meaning the unweighted-UniFrac distance is calculated for all pairs of samples.} \item{normalized}{(Optional). Logical. Should the output be normalized such that values range from 0 to 1 independent of branch length values? Default is \code{TRUE}. Note that (unweighted) \code{UniFrac} is always normalized by total branch-length, and so this value is ignored when \code{weighted == FALSE}.} \item{parallel}{(Optional). Logical. Should execute calculation in parallel, using multiple CPU cores simultaneously? This can dramatically hasten the computation time for this function. However, it also requires that the user has registered a parallel ``backend'' prior to calling this function. Default is \code{FALSE}. If FALSE, UniFrac will register a serial backend so that \code{foreach::\%dopar\%} does not throw a warning.} \item{fast}{(Optional). Logical. Do you want to use the ``Fast UniFrac'' algorithm? Implemented natively in the \code{phyloseq-package}. This is the default and the recommended option. There should be no difference in the output between the two algorithms. Moreover, the original UniFrac algorithm only outperforms this implementation of fast-UniFrac if the datasets are so small (approximated by the value of \code{nspecies(physeq) * nsamples(physeq)}) that the difference in time is inconsequential (less than 1 second). In practice it does not appear that this parameter should ever be set to \code{FALSE}, but the option is nevertheless included in the package for comparisons and instructional purposes.} } \value{ a sample-by-sample distance matrix, suitable for NMDS, etc. } \description{ This function calculates the (Fast) UniFrac distance for all sample-pairs in a \code{\link{phyloseq-class}} object. } \details{ \code{UniFrac()} accesses the abundance (\code{\link{otuTable-class}}) and a phylogenetic tree (\code{\link{phylo-class}}) data within an experiment-level (\code{\link{phyloseq-class}}) object. If the tree and contingency table are separate objects, suggested solution is to combine them into an experiment-level class using the \code{\link{phyloseq}} function. For example, the following code \code{phyloseq(myOTUtable, myTree)} returns a \code{phyloseq}-class object that has been pruned and comprises the minimum arguments necessary for \code{UniFrac()}. Parallelization is possible for UniFrac calculated with the \code{\link{phyloseq-package}}, and is encouraged in the instances of large trees, many samples, or both. Parallelization has been implemented via the \code{\link{foreach-package}}. This means that parallel calls need to be preceded by 2 or more commands that register the parallel ``backend''. This is acheived via your choice of helper packages. One of the simplest seems to be the \emph{doParallel} package. For more information, see the following links on registering the ``backend'': \emph{foreach} package manual: \url{http://cran.r-project.org/web/packages/foreach/index.html} Notes on parallel computing in \code{R}. Skip to the section describing the \emph{foreach Framework}. It gives off-the-shelf examples for registering a parallel backend using the \emph{doMC}, \emph{doSNOW}, or \emph{doMPI} packages: \url{http://trg.apbionet.org/euasiagrid/docs/parallelR.notes.pdf} Furthermore, as of \code{R} version \code{2.14.0} and higher, a parallel package is included as part of the core installation, \code{\link{parallel-package}}, and this can be used as the parallel backend with the \code{\link{foreach-package}} using the adaptor package ``doParallel''. \url{http://cran.r-project.org/web/packages/doParallel/index.html} See the vignette for some simple examples for using doParallel. \url{http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf} UniFrac-specific examples for doParallel are provided in the example code below. } \examples{ # ################################################################################ # # Perform UniFrac on esophagus data # ################################################################################ # data("esophagus") # (y <- UniFrac(esophagus, TRUE)) # UniFrac(esophagus, TRUE, FALSE) # UniFrac(esophagus, FALSE) # picante::unifrac(as(t(otuTable(esophagus)), "matrix"), tre(esophagus)) # ################################################################################ # # Try phylocom example data from picante package # # It comes as a list, so you must construct the phyloseq object first. # ################################################################################ # data("phylocom") # (x1 <- phyloseq(otuTable(phylocom$sample, FALSE), phylocom$phylo)) # UniFrac(x1, TRUE) # UniFrac(x1, TRUE, FALSE) # UniFrac(x1, FALSE) # picante::unifrac(phylocom$sample, phylocom$phylo) # ################################################################################ # # Now try a parallel implementation using doParallel, which leverages the # # new 'parallel' core package in R 2.14.0+ # # Note that simply loading the 'doParallel' package is not enough, you must # # call a function that registers the backend. In general, this is pretty easy # # with the 'doParallel package' (or one of the alternative 'do*' packages) # # # # Also note that the esophagus example has only 3 samples, and a relatively small # # tree. This is fast to calculate even sequentially and does not warrant # # parallelized computation, but provides a good quick example for using UniFrac() # # in a parallel fashion. The number of cores you should specify during the # # backend registration, using registerDoParallel(), depends on your system and # # needs. 3 is chosen here for convenience. If your system has only 2 cores, this # # will probably fault or run slower than necessary. # ################################################################################ # library(doParallel) # data(esophagus) # # For SNOW-like functionality (works on Windows): # cl <- makeCluster(3) # registerDoParallel(cl) # UniFrac(esophagus, TRUE) # # Force to sequential backed: # registerDoSEQ() # # For multicore-like functionality (will probably not work on windows), # # register the backend like this: # registerDoParallel(cores=3) # UniFrac(esophagus, TRUE) ################################################################################ } \references{ \url{http://bmf.colorado.edu/unifrac/} The main implementation (Fast UniFrac) is adapted from the algorithm's description in: Hamady, Lozupone, and Knight, ``Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data.'' The ISME Journal (2010) 4, 17--27. \url{http://www.nature.com/ismej/journal/v4/n1/full/ismej200997a.html} See also additional descriptions of UniFrac in the following articles: Lozupone, Hamady and Knight, ``UniFrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.'', BMC Bioinformatics 2006, 7:371 Lozupone, Hamady, Kelley and Knight, ``Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.'' Appl Environ Microbiol. 2007 Lozupone C, Knight R. ``UniFrac: a new phylogenetic method for comparing microbial communities.'' Appl Environ Microbiol. 2005 71 (12):8228-35. } \seealso{ \code{\link{distance}}, \code{\link[picante]{unifrac}} }