--- title: "Converting Common Data Formats to Phyloseq and TreeSummarizedExperiment" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Import} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` The "dar" package is a versatile and user-friendly tool designed to accept inputs in a variety of formats. It primarily utilizes the `phyloseq` format but also supports the `TreeSummarizedExperiment` format. This flexibility allows users to conduct differential abundance analysis smoothly, irrespective of their initial data format. To facilitate this, a detailed guide is available to aid users in converting other prevalent data formats, such as `biome`, `mothur`, `metaphlan`, and more, into the necessary `phyloseq` or `TreeSummarizedExperiment` formats. ```{r} suppressPackageStartupMessages(library(mia)) suppressPackageStartupMessages(library(phyloseq)) ``` ## Importing Data from `biome` Format The `biome` format is a commonly used format in bioinformatics to represent microbiome sequencing data. Here's how you can import data in `biome` format to both `phyloseq` and `TreeSummarizedExperiment.` ### To Phyloseq To convert data from the `biome` format to the `phyloseq` format, you can use the `phyloseq::import_biom()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Example of a rich dense biom file rich_dense_biom <- system.file("extdata", "rich_dense_otu_table.biom", package = "phyloseq") # Import biom as a phyloseq-class object phy <- phyloseq::import_biom( rich_dense_biom, parseFunction = parse_taxonomy_greengenes ) phy # Print sample_data phyloseq::sample_data(phy) # Print tax_table phyloseq::tax_table(phy) # Recipe init rec <- dar::recipe(phy, var_info = "BODY_SITE", tax_info = "Genus") rec ``` ### To TreeSummarizedExperiment To convert data from the `biome` format to the `TreeSummarizedExperiment` format, you can use the `mia::loadFromBiom()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Example of a rich dense biom file rich_dense_biom <- system.file("extdata", "rich_dense_otu_table.biom", package = "phyloseq") # Import biom as a phyloseq-class object tse <- mia::loadFromBiom(rich_dense_biom) tse # Print sample_data colData(tse) # Print tax_table rowData(tse) # Change the column names of the tax_table colnames(rowData(tse)) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species") rowData(tse) # Recipe init rec <- dar::recipe(tse, var_info = "BODY_SITE", tax_info = "Genus") rec ``` ## Importing Data from `qiime` Format The `qiime` format is another commonly used format in bioinformatics for microbiome sequencing data. Here's how you can import data in `qiime` format to both `Phyloseq` and `TreeSummarizedExperiment.` ### To Phyloseq To convert data from the `qiime` format to the `Phyloseq` format, you can use the `phyloseq::import_qiime()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Import QIIME data phy_qiime <- phyloseq::import_qiime( otufilename = system.file("extdata", "GP_otu_table_rand_short.txt.gz", package = "phyloseq"), mapfilename = system.file("extdata", "master_map.txt", package = "phyloseq"), treefilename = system.file("extdata", "GP_tree_rand_short.newick.gz", package = "phyloseq") ) phy_qiime # Recipe init rec <- dar::recipe(phy_qiime, var_info = "SampleType", tax_info = "Genus") rec ``` ### To TreeSummarizedExperiment To convert data from the `qiime` format to the `TreeSummarizedExperiment` format, you can use the `mia::loadFromQIIME2()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Import QIIME data to tse tse_qiime <- mia::loadFromQIIME2( featureTableFile = system.file("extdata", "table.qza", package = "mia"), taxonomyTableFile = system.file("extdata", "taxonomy.qza", package = "mia"), sampleMetaFile = system.file("extdata", "sample-metadata.tsv", package = "mia"), refSeqFile = system.file("extdata", "refseq.qza", package = "mia"), phyTreeFile = system.file("extdata", "tree.qza", package = "mia") ) tse_qiime # Recipe init rec <- dar::recipe(tse_qiime, var_info = "body.site", tax_info = "Genus") rec ``` ## Importing Data from `mothur` Format The `mothur` format is another commonly used format in bioinformatics for microbiome sequencing data. Here's how you can import data in `mothur` format to both `Phyloseq` and `TreeSummarizedExperiment.` ### To Phyloseq To convert data from the `mothur` format to the `Phyloseq` format, you can use the `phyloseq::import_mothur()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Import Mothur data phy_mothur <- phyloseq::import_mothur( mothur_list_file = system.file("extdata", "esophagus.fn.list.gz", package = "phyloseq"), mothur_group_file = system.file("extdata", "esophagus.good.groups.gz", package = "phyloseq"), mothur_tree_file = system.file("extdata", "esophagus.tree.gz", package = "phyloseq") ) phy_mothur # Recipe init rec <- dar::recipe(phy_mothur) rec ``` ### To TreeSummarizedExperiment To convert data from the `mothur` format to the `TreeSummarizedExperiment` format, you can use the `mia::loadFromMothur()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Import Mothur data to TreeSummarizedExperiment tse_mothur <- mia::loadFromMothur( sharedFile = system.file("extdata", "mothur_example.shared", package = "mia"), taxonomyFile = system.file("extdata", "mothur_example.cons.taxonomy", package = "mia"), designFile = system.file("extdata", "mothur_example.design", package = "mia") ) |> methods::as("TreeSummarizedExperiment") tse_mothur # Recipe init rec <- dar::recipe(tse_mothur, var_info = "drug", tax_info = "Genus") rec ``` ## Importing Data from `metaphlan` Format The `metaphlan` format is another commonly used format in bioinformatics for microbiome sequencing data. Here's how you can import data in `metaphlan` format to `TreeSummarizedExperiment.` ### To TreeSummarizedExperiment To convert data from the `metaphlan` format to the `TreeSummarizedExperiment` format, you can use the `mia::loadFromMetaphlan()` function. Here's a step-by-step example of how to perform this conversion: ```{r} # Importing data from Metaphlan tse_metaphlan <- mia::loadFromMetaphlan( file = system.file("extdata", "merged_abundance_table.txt", package = "mia") ) # Recipe init rec <- dar::recipe(tse_metaphlan) rec ``` ## Conclusion In this guide, we have explored various methods for importing microbiome sequencing data from different formats into `Phyloseq` and `TreeSummarizedExperiment.` We've covered the `biome`, `qiime`, `mothur`, `metaphlan`, and `humann` formats, providing step-by-step examples for each. The flexibility of these tools allows for a smooth transition between different data formats, making it easier to conduct your analysis irrespective of the initial data format. By following the steps outlined in this guide, you should be able to successfully convert your data and carry out further differential abundance analysis. Remember, the specific details of your data may require you to adjust the parameters in the import functions. Always inspect your data after conversion to ensure it has been imported correctly. ## Session info ```{r} devtools::session_info() ```