--- title: "TSAR Package Structure" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{TSAR Package Structure} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(warning = FALSE, message = FALSE, comment = "#>") ``` # 1. Introduction TSAR Package provides simple solution to qPCR data processing, computing thermal shift analysis given either raw fluorescent data or smoothed curves. The functions provide users with the protocol to conduct preliminary data checks and also expansive analysis on large scale of data. Furthermore, it showcases simple graphic presentation of analysis, generating clear box plot and line graphs given input of desired designs. Overall, TSAR Package offers a workflow easy to manage and visualize. # 2. Installation Use commands below to install TSAR package: library(BiocManager) BiocManager::install("CGAO123/TSAR", build_vignettes = TRUE) ```{r setup, message=FALSE, warning=FALSE} library(TSAR) library(dplyr) library(ggplot2) library(shiny) library(utils) ``` ```{r, echo=FALSE, fig.width=4, out.width="400px"} knitr::include_graphics("images/TSAR_logo.png") ``` # 3. Data Structure TSAR segregates data structure into three tiers: - `raw_data`, raw-readings of qPCR data - `norm_data`, pre-processed and normalized data - `tsar_data`, analyzed with ready for graphs Users may initiate the TSAR workflow from either `raw_data` or `norm_data` as long as the data achieves approriate qualities. Functions corresponding each tier are wrapped in shiny application. All analysis and visualization can be achieved within user-interactive window, also open-able in local browser. - `weed_raw()`; input: raw; output: data without blank and corrupted curves - `analyze_norm()`; input: raw or output of weed_raw; output: analyzed data with conditions specified - `graph_tsar()`; input: tsar_data or output of analyze_norm; output: graphs # 4. From `raw_data` to `norm_data` To access the workflow of `TSAR`, user can input raw fluorescent readings through built in functions from utils: `read.delim()` or `read.csv()`. Built in data imports from RStudio UI is also appropriate, given `raw_data` only needs to be saved as `dataframe` structure. Wrapping input in `data.frame()` will also ensure correct data type. Use `head()` and `tail()` to ensure excessive information is removed, i.e. blank lines, duplicate titles, etc. ```{r, results = 'hide'} data("qPCR_data1") raw_data <- qPCR_data1 head(raw_data) tail(raw_data) ``` ## 4.1 Data Preprocessing From raw data to ready-for-analysis data, there are few functions to assist the selection and normalization of data. Functions `screen()` and `remove_raw()` helps screen and remove data of selection. This save time and space by remove unwanted data such as blank wells and corrupted curves. Aside from `data` param input, both functions share similar parameters of `checkrange`, `checklist` or `removerange`, `removelist`. - `checkrange`, `removerange` : a range of wells, e.g. wells from A01 to B08 is `c("A","B","1","8")` - `checklist`, `removelist`: a list of wells, e.g. `c("A01","C03")` If no wells are specified, `screen()` will default to screening every well and `reomve_raw()` will default to not removing an well. ```{r, out.width = "400px"} screen(raw_data) + theme( aspect.ratio = 0.7, legend.position = "bottom", legend.text = element_text(size = 4), legend.key.size = unit(0.1, "cm"), legend.title = element_text(size = 6) ) + guides(color = guide_legend(nrow = 4, byrow = TRUE)) raw_data <- remove_raw(raw_data, removerange = c("C", "H", "1", "12")) ``` Both functions above are wrapped inside an interactive window through function `weed_raw()`. It is implemented through R Shiny application where users can select curves using cursor and remove selected curves easily. Refer to separate vignette, `"TSAR Workflow by Shiny"`, and README.md file for more documentation. ```{r eval = FALSE} runApp(weed_raw(raw_data)) ``` Running the window, we can spot that the curve at A12 has an abnormally high initial fluorescence and should be removed for data accuracy. We can make sure of correct selection through the `View Selected` button and remove it using the `Remove Data` button. Note that all data edits are made inside the interactive window. To translate the change globally for downstream analysis, simply click `Save to R`and close window to store data into the global environment. Alternatively, click `Copy Selected` and paste information to `remove_raw()`. To avoid error, close window proper through `Close Window` instead of the cross mark on top left corner. ```{r} raw_data <- remove_raw(raw_data, removelist = c( "B04", "B11", "B09", "B05", "B10", "B03", "B02", "B01", "B08", "B12", "B07", "B06" ) ) ``` ## 4.2 Data Analysis Normalizing data is prompted through `normalize()`. Although individual calls are not necessary as they are wrapped together in `gam_analysis()`, if viewing the validity of model is desired, one can prompt analysis of one well. TSAR package performs derivative analysis using a generalized additive model through package `mgcv` or boltzmann analysis using nlsLM from package `minpack.lm`. ### 4.2.1 Individual Well Application For analysis of an idividual well, refer to these following functions: - `normalize()` - `model_gam()` - `model_fit()` - `view_model()` - `Tm_est()` ```{r, out.width = "400px"} test <- filter(raw_data, raw_data$Well.Position == "A01") test <- normalize(test) gammodel <- model_gam(test, x = test$Temperature, y = test$Normalized) test <- model_fit(test, model = gammodel) view <- view_model(test) view[[1]] + theme(aspect.ratio = 0.7, legend.position = "bottom") view[[2]] + theme(aspect.ratio = 0.7, legend.position = "bottom") Tm_est(test) ``` View model generates a list of two graphs, showing fit of modeling on fluorescence data and the derivative calculation of such data. ### 4.2.2 96-Well Plate Application All analysis necessary are formatted in function `gam_analysis()`. Parameters are inherited from functions noted in section "Individual Well Application". Hence, if any errors are prompted, check through individual well application for correct parameter input and other potential errors. - `smoothed` inherited from model_fit() - `fluo` and `selections` inherited from normalize() ```{r} x <- gam_analysis(raw_data, smoothed = TRUE, fluo_col = 5, selections = c( "Well.Position", "Temperature", "Fluorescence", "Normalized" ) ) ``` ## 4.3 Data Summary Data summary offers an exit point from the workflow if no further graphic outputs are required. Output is allowed in three formats: - `output_content = 0`, only Tm values by well - `output_content = 1`, all data analysis by each temperature reading. If previously called `smoothed = T`, analysis will not run gam modeling, thus will not have `fitted` data. - `output_content = 2`, combination of the above two data set. To associate ligand and protein conditions with each individual well, call the function `join_well_info()`. One may specify using the template excel or separate csv file containing a table of three variables, "Well", "Protein", and "Ligand". ```{r message=FALSE} data("well_information") output <- join_well_info( file_path = NULL, file = well_information, read_tsar(x, output_content = 0), type = "by_template" ) ``` Write output using command write_tsar `write_tsar(output, name = "vitamin_analysis", file = "csv")` To streamline to the following graphic analysis, make sure `output_content = 2` to maintain all necessary data. ```{r message=FALSE} norm_data <- join_well_info( file_path = NULL, file = well_information, read_tsar(x, output_content = 2), type = "by_template" ) ``` All of the function above in section 2 are wrapped together in a shiny application named `analyze_norm()`. Refer to separate vignette, `"TSAR Workflow by Shiny"`, and README.md file for more documentation. ```{r eval = FALSE} runApp(analyze_norm(raw_data)) ``` # 5. From `norm_data` to `tsar_data` `norm_data` contains normalized fluorescent data on a scale of 0 to 1 based on the maximum and minimum fluorescence reading. `norm_data` also contains a first derivative column. `tsar_data` is the final format of project data encapsulating all replication. Therefore, it contains all condition data including experiment date and analysis file source. ## 5.1 Merge Replicates Use `merge_norm()` to merge all norm_data and specify original data file name and experiment date for latter tracking purposes. ```{r message=FALSE} # analyze replicate data data("qPCR_data2") raw_data_rep <- qPCR_data2 raw_data_rep <- remove_raw(raw_data_rep, removerange = c("B", "H", "1", "12"), removelist = c("A12") ) analysis_rep <- gam_analysis(raw_data_rep, smoothed = TRUE) norm_data_rep <- join_well_info( file_path = NULL, file = well_information, read_tsar(analysis_rep, output_content = 2), type = "by_template" ) # merge data tsar_data <- merge_norm( data = list(norm_data, norm_data_rep), name = c( "Vitamin_RawData_Thermal Shift_02_162.eds.csv", "Vitamin_RawData_Thermal Shift_02_168.eds.csv" ), date = c("20230203", "20230209") ) ``` ### 5.1.1 Jumpstart to Graph If outputted data from qPCR already contains analysis and data necessary, enter TSAR workflow from here, using functions `merge_TSA()`, `read_raw_data()`, `read_analysis()`. ```{r} #analysis_file <- read_analysis(analysis_file_path) #raw_data <- read_raw_data(raw_data_path) #merge_TSA(analysis_file, raw_data) ``` After merging, use assisting functions to check and trace data. Use these two functions to guide graphics analysis for error identification, selective graphing and graph comparisons. - `condition_IDs()` list all conditions in data - `well_IDs()` list all IDs of individual well - `TSA_proteins()` list all distinct proteins - `TSA_ligands()` list all distinct ligands - `TSA_Tms()` list all Tm estimations by condition - `Tm_difference()` list all delta Tm estimations by control condition ```{r} condition_IDs(tsar_data) well_IDs(tsar_data) TSA_proteins(tsar_data) TSA_ligands(tsar_data) conclusion <- tsar_data %>% filter(condition_ID != "NA_NA") %>% filter(condition_ID != "CA FL_Riboflavin") ``` ## 5.2 Graphic Analysis ### 5.2.1 Tm Boxplot Use `TSA_boxplot()` to generate comparison boxplot graphs. Stylistics choices include coloring by protein or ligand, and legend separation. Function returns ggplot object, thus further stylistic changes are allowed. ```{r, out.width = "400px"} TSA_boxplot(conclusion, color_by = "Protein", label_by = "Ligand", separate_legend = TRUE ) ``` ### 5.2.2 TSA Curve Visualization `TSA_compare_plot()` generates multiple line graphs for comparison. Specify Control condition by assigning condition_ID to control. Functions allows graphing by both: - raw fluorescent readings `y = 'Fluorescence'` - normalized readings `y = 'RFU'`. ```{r} control_ID <- "CA FL_DMSO" TSA_compare_plot(conclusion, y = "RFU", control_condition = control_ID ) ``` ### 5.2.3 Curves by Condition Users may also graph by condition IDs or well IDs using function `TSA_wells_plot()`. ```{r} ABA_Cond <- conclusion %>% filter(condition_ID == "CA FL_4-ABA") TSA_wells_plot(ABA_Cond, separate_legend = TRUE) ``` ### 5.2.4 First Derivative Comparison To further visualization comparison, graph first derivatives grouped by needs. Note if modeling was set to boltzman fit, frist derivatives will be excessively smooth and contains no information beyond specified minimum and maximum. Below is an example command. Due to size limit of vignette, graph will not be displayed. `view_deriv(conclusion, frame_by = "condition_ID")` All of the above functions are also wrapped in an interactive window call through `graph_tsar()`. Simply call function on merged tsar_data and access all graphing features in one window. Refer to separate vignette, `"TSAR Workflow by Shiny"`, and readme.md file for more documentation. ```{r eval = FALSE} runApp(graph_tsar(tsar_data)) ``` # 6. Session Info ## 6.1 Citation ```{r} citation("TSAR") citation() citation("dplyr") citation("ggplot2") citation("shiny") citation("utils") ``` ## 6.2 Session Info ```{r} sessionInfo() ```