---
title: "TSAR Package Structure"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{TSAR Package Structure}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE, comment = "#>")
```

# 1. Introduction
TSAR Package provides simple solution to qPCR data processing, computing 
thermal shift analysis given either raw fluorescent data or smoothed curves. 
The functions provide users with the protocol to conduct preliminary data checks
and also expansive analysis on large scale of data. Furthermore, it showcases 
simple graphic presentation of analysis, generating clear box plot and line 
graphs given input of desired designs. Overall, TSAR Package offers a workflow
easy to manage and visualize.

# 2. Installation
Use commands below to install TSAR package:
library(BiocManager)
BiocManager::install("TSAR")

```{r setup, message=FALSE, warning=FALSE}
library(TSAR)
library(dplyr)
library(ggplot2)
library(shiny)
library(utils)
```

```{r, echo=FALSE, fig.width=4, out.width="400px"}
knitr::include_graphics("images/TSAR_logo.png")
```

# 3. Data Structure

TSAR segregates data structure into three tiers:

-   `raw_data`, raw-readings of qPCR data
-   `norm_data`, pre-processed and normalized data
-   `tsar_data`, analyzed with ready for graphs

Users may initiate the TSAR workflow from either `raw_data` or `norm_data` as
long as the data achieves approriate qualities. Functions corresponding each 
tier are wrapped in shiny application. All analysis and visualization can be 
achieved within user-interactive window, also open-able in local browser.

- `weed_raw()`; input: raw; output: data without blank and corrupted curves
- `analyze_norm()`; input: raw or output of weed_raw; output: analyzed data 
with conditions specified
- `graph_tsar()`; input: tsar_data or output of analyze_norm; output: graphs

# 4. From `raw_data` to `norm_data`

To access the workflow of `TSAR`, user can input raw fluorescent readings 
through built in functions from utils: `read.delim()` or `read.csv()`. Built 
in data imports from RStudio UI is also appropriate, given `raw_data` only needs
to be saved as `dataframe` structure. Wrapping input in `data.frame()` will 
also ensure correct data type. Use `head()` and `tail()` to ensure excessive 
information is removed, i.e. blank lines, duplicate titles, etc.

```{r, results = 'hide'}
data("qPCR_data1")
raw_data <- qPCR_data1
head(raw_data)
tail(raw_data)
```

## 4.1 Data Preprocessing

From raw data to ready-for-analysis data, there are few functions to assist the 
selection and normalization of data.

Functions `screen()` and `remove_raw()` helps screen and remove data of 
selection. This save time and space by remove unwanted data such as blank wells 
and corrupted curves. Aside from `data` param input, both functions share 
similar parameters of `checkrange`, `checklist` or `removerange`, `removelist`.

-   `checkrange`, `removerange` : a range of wells, e.g. wells from A01 to 
B08 is `c("A","B","1","8")`

-   `checklist`, `removelist`: a list of wells, e.g. `c("A01","C03")`

If no wells are specified, `screen()` will default to screening every well 
and `reomve_raw()` will default to not removing an well.

```{r, out.width = "400px"}
screen(raw_data) + theme(
    aspect.ratio = 0.7,
    legend.position = "bottom",
    legend.text = element_text(size = 4),
    legend.key.size = unit(0.1, "cm"),
    legend.title = element_text(size = 6)
) +
    guides(color = guide_legend(nrow = 4, byrow = TRUE))
raw_data <- remove_raw(raw_data, removerange = c("C", "H", "1", "12"))
```

Both functions above are wrapped inside an interactive window through function
`weed_raw()`. It is implemented through R Shiny application where users can 
select curves using cursor and remove selected curves easily. Refer to separate 
vignette, `"TSAR Workflow by Shiny"`, and README.md file for more documentation.

```{r eval = FALSE}
runApp(weed_raw(raw_data))
```

Running the window, we can spot that the curve at A12 has an abnormally high 
initial fluorescence and should be removed for data accuracy. We can make sure 
of correct selection through the `View Selected` button and remove it using 
the `Remove Data` button. Note that all data edits are made inside the 
interactive window. To translate the change globally for downstream analysis, 
simply click `Save to R`and close window to store data into the global 
environment. Alternatively, click `Copy Selected` and paste information to 
`remove_raw()`. To avoid error, close window proper through `Close Window` 
instead of the cross mark on top left corner.

```{r}
raw_data <- remove_raw(raw_data,
    removelist =
        c(
            "B04", "B11", "B09", "B05", "B10", "B03",
            "B02", "B01", "B08", "B12", "B07", "B06"
        )
)
```

## 4.2 Data Analysis

Normalizing data is prompted through `normalize()`. Although individual calls 
are not necessary as they are wrapped together in `gam_analysis()`, if viewing 
the validity of model is desired, one can prompt analysis of one well.

TSAR package performs derivative analysis using a generalized additive model 
through package `mgcv` or boltzmann analysis using nlsLM from package 
`minpack.lm`.

### 4.2.1 Individual Well Application

For analysis of an idividual well, refer to these following functions:

-   `normalize()`

-   `model_gam()`

-   `model_fit()`

-   `view_model()`

-   `Tm_est()`

```{r, out.width = "400px"}
test <- filter(raw_data, raw_data$Well.Position == "A01")
test <- normalize(test)
gammodel <- model_gam(test, x = test$Temperature, y = test$Normalized)
test <- model_fit(test, model = gammodel)
view <- view_model(test)
view[[1]] + theme(aspect.ratio = 0.7, legend.position = "bottom")
view[[2]] + theme(aspect.ratio = 0.7, legend.position = "bottom")
Tm_est(test)
```
View model generates a list of two graphs, showing fit of modeling on 
fluorescence data and the derivative calculation of such data.


### 4.2.2 96-Well Plate Application

All analysis necessary are formatted in function `gam_analysis()`. Parameters 
are inherited from functions noted in section "Individual Well Application". 
Hence, if any errors are prompted, check through individual well application 
for correct parameter input and other potential errors.

-   `smoothed` inherited from model_fit()
-   `fluo` and `selections` inherited from normalize()

```{r}
x <- gam_analysis(raw_data,
    smoothed = TRUE,
    fluo_col = 5,
    selections = c(
        "Well.Position", "Temperature",
        "Fluorescence", "Normalized"
    )
)
```

## 4.3 Data Summary

Data summary offers an exit point from the workflow if no further graphic 
outputs are required. Output is allowed in three formats: 
-   `output_content = 0`, only Tm values by well 
-   `output_content = 1`, all data analysis by each temperature reading. 
If previously called `smoothed = T`, analysis will not run gam modeling, thus 
will not have `fitted` data. 
-   `output_content = 2`, combination of the above two data set.

To associate ligand and protein conditions with each individual well, call 
the function `join_well_info()`. One may specify using the template excel or 
separate csv file containing a table of three variables, "Well", "Protein", 
and "Ligand".

```{r message=FALSE}
data("well_information")
output <- join_well_info(
    file_path = NULL, file = well_information,
    read_tsar(x, output_content = 0), type = "by_template"
)
```

Write output using command write_tsar
`write_tsar(output, name = "vitamin_analysis", file = "csv")`

To streamline to the following graphic analysis, make sure `output_content = 2` 
to maintain all necessary data.

```{r message=FALSE}
norm_data <- join_well_info(
    file_path = NULL, file = well_information,
    read_tsar(x, output_content = 2), type = "by_template"
)
```

All of the function above in section 2 are wrapped together in a shiny 
application named `analyze_norm()`. Refer to separate vignette, 
`"TSAR Workflow by Shiny"`, and README.md file for more documentation.

```{r eval = FALSE}
runApp(analyze_norm(raw_data))
```

# 5. From `norm_data` to `tsar_data`

`norm_data` contains normalized fluorescent data on a scale of 0 to 1 based 
on the maximum and minimum fluorescence reading. `norm_data` also contains a 
first derivative column. `tsar_data` is the final format of project data 
encapsulating all replication. Therefore, it contains all condition data 
including experiment date and analysis file source.

## 5.1 Merge Replicates

Use `merge_norm()` to merge all norm_data and specify original data file name 
and experiment date for latter tracking purposes.

```{r message=FALSE}
# analyze replicate data
data("qPCR_data2")
raw_data_rep <- qPCR_data2
raw_data_rep <- remove_raw(raw_data_rep,
    removerange = c("B", "H", "1", "12"),
    removelist = c("A12")
)
analysis_rep <- gam_analysis(raw_data_rep, smoothed = TRUE)
norm_data_rep <- join_well_info(
    file_path = NULL, file = well_information,
    read_tsar(analysis_rep, output_content = 2),
    type = "by_template"
)

# merge data
tsar_data <- merge_norm(
    data = list(norm_data, norm_data_rep),
    name = c(
        "Vitamin_RawData_Thermal Shift_02_162.eds.csv",
        "Vitamin_RawData_Thermal Shift_02_168.eds.csv"
    ),
    date = c("20230203", "20230209")
)
```

### 5.1.1 Jumpstart to Graph

If outputted data from qPCR already contains analysis and data necessary, 
enter TSAR workflow from here, using functions `merge_TSA()`, `read_raw_data()`,
`read_analysis()`.

```{r}
#analysis_file <- read_analysis(analysis_file_path)
#raw_data <- read_raw_data(raw_data_path)
#merge_TSA(analysis_file, raw_data)
```

After merging, use assisting functions to check and trace data. Use these two 
functions to guide graphics analysis for error identification, selective 
graphing and graph comparisons.

-   `condition_IDs()` list all conditions in data
-   `well_IDs()` list all IDs of individual well
-   `TSA_proteins()` list all distinct proteins
-   `TSA_ligands()` list all distinct ligands
-   `TSA_Tms()` list all Tm estimations by condition
-   `Tm_difference()` list all delta Tm estimations by control condition

```{r}
condition_IDs(tsar_data)
well_IDs(tsar_data)
TSA_proteins(tsar_data)
TSA_ligands(tsar_data)

conclusion <- tsar_data %>%
    filter(condition_ID != "NA_NA") %>%
    filter(condition_ID != "CA FL_Riboflavin")
```

## 5.2 Graphic Analysis

### 5.2.1 Tm Boxplot

Use `TSA_boxplot()` to generate comparison boxplot graphs. Stylistics choices 
include coloring by protein or ligand, and legend separation. Function returns 
ggplot object, thus further stylistic changes are allowed.

```{r, out.width = "400px"}
TSA_boxplot(conclusion,
    color_by = "Protein",
    label_by = "Ligand", separate_legend = TRUE
)
```

### 5.2.2 TSA Curve Visualization

`TSA_compare_plot()` generates multiple line graphs for comparison. Specify 
Control condition by assigning condition_ID to control. Functions allows 
graphing by both: - raw fluorescent readings `y = 'Fluorescence'` - normalized 
readings `y = 'RFU'`.

```{r}
control_ID <- "CA FL_DMSO"

TSA_compare_plot(conclusion,
    y = "RFU",
    control_condition = control_ID
)
```

### 5.2.3 Curves by Condition

Users may also graph by condition IDs or well IDs using function 
`TSA_wells_plot()`.

```{r}
ABA_Cond <- conclusion %>% filter(condition_ID == "CA FL_4-ABA")
TSA_wells_plot(ABA_Cond, separate_legend = TRUE)
```

### 5.2.4 First Derivative Comparison
To further visualization comparison, graph first derivatives grouped by needs. 
Note if modeling was set to boltzman fit, frist derivatives will be excessively 
smooth and contains no information beyond specified minimum and maximum.
Below is an example command. Due to size limit of vignette, graph will not be
displayed.
`view_deriv(conclusion, frame_by = "condition_ID")`

All of the above functions are also wrapped in an interactive window call
through `graph_tsar()`. Simply call function on merged tsar_data and access all 
graphing features in one window. Refer to separate vignette, 
`"TSAR Workflow by Shiny"`, and readme.md file for more documentation.

```{r eval = FALSE}
runApp(graph_tsar(tsar_data))
```

# 6. Session Info

## 6.1 Citation
```{r}
citation("TSAR")
citation()
citation("dplyr")
citation("ggplot2")
citation("shiny")
citation("utils")
```

## 6.2 Session Info
```{r}
sessionInfo()
```