---
title: "Example using structToolbox"
author:
- name: Gavin R Lloyd
  affiliation: Phenome Centre Birmingham, University of Birmingham, UK
  email: g.r.lloyd@bham.ac.uk
- name: Ralf J Weber
  affiliation: Phenome Centre Birmingham, University of Birmingham, UK
  email: r.j.weber@bham.ac.uk
output:
  BiocStyle::html_document:
    toc: yes
    toc_depth: 2
    number_sections: yes
    toc_float: yes
package: metabolomicsWorkbenchR
vignette: >
  %\VignetteIndexEntry{Example using structToolbox} 
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r echo = FALSE,include=FALSE}
suppressPackageStartupMessages(library(structToolbox))
suppressPackageStartupMessages(library(httptest))
suppressPackageStartupMessages(library(metabolomicsWorkbenchR))
httptest::start_vignette('structToolbox_example')
```


# Introduction
Metabolomics Workbench [(link)](www.metabolomicsworkbench.org) hosts a metabolomics 
data repository. It contains over 1000 publicly available studies including raw data, 
processed data and metabolite/compound information.

The repository is searchable using a REST service API. The metabolomicsWorkbenchR
package makes the endpoints of this service available in R and provides functionality
to search the database and import datasets and metabolite information into commonly used 
formats such as data frames and SummarizedExperiment objects.

In this vigenette we will use `metabolomicsWorkbenchR` to retrieve the uploaded peak matrix
for a study. We will then use `structToolbox` to apply a basic workflow to analyse the data.

# Installation
To install this package enter:
```
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("metabolomicsWorkbenchR")
```
For older versions, please refer to the appropriate Bioconductor release.

# Querying the database
The API endpoints for Metabolomics Workbench are accessible using the `do_query`
function in `metabolomicsWorkBenchR`.

The `do_query` functions takes 4 inputs:
- `context`       A valid context name (character)
- `input_item`    A valid input_item name (character)
- `input_value`   A valid input_value name (character)
- `output_item`   A valid output_item (character)

Contexts refer to the different database searches available in the API. The reader 
is referred to the API manual for details of each context 
[(link)](https://www.metabolomicsworkbench.org/tools/mw_rest.php). 
In `metabolomicsWorkBenchR` contexts are stored as a list, and a list of valid 
contexts can be obtained using the `names` function:

```{r}
names(metabolomicsWorkbenchR::context)
```

`input_item` is specific to a context. Valid items for a context can
be listed using `context_inputs` function:

```{r}
cat('Valid inputs:\n')
context_inputs('study')
cat('\nValid outputs:\n')
context_outputs('study')
```

# Choosing a study
First we query the database to return a list of untargeted studies. We use the 
"study" context in combination with a special case input item called "ignored" 
that is required for the "untarg_studies" output item.

```{r}
US = do_query(
  context = 'study',
  input_item = 'ignored',
  input_value = 'ignored',
  output_item = 'untarg_studies'
)

head(US[,1:3])
```

We will pull data for study "ST000009". We can obtain summary information using
the "summary" output item.

```{r}
S = do_query('study','study_id','ST000010','summary')
t(S)
```

As there are multiple datasets per study untargeted data needs to be requested 
by Analysis ID. We will request DatasetExperiment format so that we can use the 
data directly with `structToolbox`.

```{r,eval=FALSE}
DE = do_query(
  context = 'study',
  input_item = 'analysis_id',
  input_value = 'AN000025',
  output_item = 'untarg_DatasetExperiment'
)
DE
```

```{r,eval=TRUE,include=FALSE}
DE=metabolomicsWorkbenchR:::AN000025
DE=as.DatasetExperiment(DE)
DE
```
# Workflow
Now we construct a minimal metabolomics workflow consisting of quality filtering,
normalisation, imputation and scaling before applying PCA.

```{r,warning=FALSE}
# model sequence
M = 
    mv_feature_filter(
      threshold = 40,
      method='across',
      factor_name='FCS') +
    mv_sample_filter(mv_threshold =40) +
    vec_norm() +
    knn_impute() +
    log_transform() + 
    mean_centre() + 
    PCA()
# apply model
M = model_apply(M,DE)

# pca scores plot
C = pca_scores_plot(factor_name=c('FCS'))
chart_plot(C,M[length(M)])
```

# Session Info
```{r,echo=FALSE}
sessionInfo()
```

```{r, include=FALSE}
end_vignette()
```