--- title: 'Data metrics object' author: - name: Lindsay Rutter date: '`r Sys.Date()`' package: bigPint bibliography: bigPint.bib output: BiocStyle::html_document: toc_float: true tidy: TRUE vignette: > \usepackage[utf8]{inputenc} %\VignetteIndexEntry{"Data metrics object"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignettePackage{bigPint} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE) ``` ## About data metrics object Researchers may wish to superimpose a subset of the full dataset onto the full dataset. If a researcher is using the package to visualize RNA-seq data, then this subset of data is often differentially expressed genes (DEGs) returned from a model. In this case, the user may wish to use the `dataMetrics` input parameter, which contains at least one quantitative variable returned from a model such as FDR, p-value, and log fold change. ____________________________________________________________________________________ ## Example: two treatments As was shown in the article [Data object](https://lindsayrutter.github.io/bigPint/articles/data.html), the `data` object called `soybean_ir_sub` contained 5,604 genes and two treatment groups, N and P [@soybeanIR]. We can examine the structure of its corresponding `dataMetrics` object called `soybean_ir_sub_metrics` as follows: ```{r, eval=TRUE, include=TRUE, message=FALSE} library(bigPint) data("soybean_ir_sub_metrics") str(soybean_ir_sub_metrics, strict.width = "wrap") ``` ____________________________________________________________________________________ ## Example: three treatments Similarly, as was shown in the data page, the `data` object called `soybean_cn_sub` contained 7,332 genes and three treatment groups, S1, S2, and S3 [@brown2015developmental]. We can examine the structure of its corresponding `dataMetrics` object called `soybean_cn_sub_metrics` as follows: ```{r, eval=TRUE, include=FALSE} data("soybean_cn_sub_metrics") str(soybean_cn_sub_metrics, strict.width = "wrap") ``` ____________________________________________________________________________________ ## Data metrics object rules As demonstrated in the two examples above, the `dataMetrics` object must meet the following conditions: * Be of type `list` * Contain a number of elements equal to the number of pairwise treatment combinations in the `data` object. For example, the `soybean_ir_sub_metrics` object contains one list element ("N_P") and the `soybean_cn_sub_metrics` object contains three list elements ("S1_S2", "S1_S3", "S2_S3"). * Have each list element + Be of type `data.frame` + Be called in a three-part format (such as "N_P" or "S2_S3") that matches the Perl expression `^[a-zA-Z0-9]+_[a-zA-Z0-9]+`, where - The first part indicates the first treatment group alphameric name - The second part consists of an underscore "_" to serve as a delimeter - The third part indicates the second treatment group alphameric name + Contain a first column called "ID" of class `character` consisting of the unique names of the genes + Contain at least one column of class `numeric` or `integer` consisting of a quantitative variable. This can be called anything. In the examples above, there are five of such columns called "logFC", "logCPM", "LR", "PValue", and "FDR". You can quickly double-check the names of the list elements in your `dataMetrics` object as follows: ```{r, eval=TRUE, include=TRUE} names(soybean_ir_sub_metrics) names(soybean_cn_sub_metrics) ``` If your `dataMetrics` object does not fit this format, `bigPint` will likely throw an informative error about why your format was not recognized. ____________________________________________________________________________________ ## References