--- title: "Advanced usage of `scp`" author: - name: Laurent Gatto - name: Christophe Vanderaa output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show bibliography: scp.bib date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('scp')`" vignette: > %\VignetteIndexEntry{Advanced usage of `scp`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html ) ``` # About this vignette This vignette is dedicated to advanced users and to method developers. It assumes that you are already familiar with `QFeatures` and `scp` and that you are looking for more flexibility in the analysis of your single-cell proteomics (SCP) data. In fact, `scp` provides wrapper functions around generic functions and metrics. However, advanced users may want to apply or develop their own features. The `QFeatures` class offers a flexible data container while guaranteeing data consistency. In this vignette, you will learn how to: - Modify the quantitative data - Modify the sample metadata - Modify the feature metadata - Create a new function for `scp` As a general guideline, you can add/remove/update data in a `QFeatures` in 4 main steps: 1. Gather the data to change and other required data involved in the processing. 2. Apply the transformation/computation. 3. Insert the changes in the `QFeatures` object. 4. Make sure the updated `QFeatures` object is still valid. To illustrate the different topics, we will load the `scp1` example data. ```{r, message = FALSE} library(scp) data("scp1") scp1 ``` # Modify the quantitative data To illustrate how to modify quantitative data, we will implement a `normByType` function that will normalize the feature (row) for each cell type separately. This function is probably not relevant for a real case analysis, but it provides a good example of a custom data processing. The process presented in this section is applicable to any custom function that takes at least a **matrix-like object** as input and returns a matrix-like object as output. ```{r} normByType <- function(x, type) { ## Check argument stopifnot(length(type) == ncol(x)) ## Normalize for each type separately for (i in type) { ## Get normalization factor nf <- rowMedians(x[, type == i], na.rm = TRUE) ## Perform normalization x[, type == i] <- x[, type == i] / nf } ## Return normalized data x } ``` Suppose we want to apply the function to the `proteins` assay, we need to first extract that assay. We here need to transfer the sample metadata from the `QFeatures` object to the extracted `SingleCellExperiment` in order to get the sample types required by the `normByType` function. We therefore use `getWithColData`. ```{r} sce <- getWithColData(scp1, "proteins") sce ``` Next, we can apply the data transformation to the quantitative data. As mentioned above, our function expects a matrix-like object as an input, so we use the `assay` function. We then update the `SingleCellExperiment` object. ```{r} mnorm <- normByType(assay(sce), type = sce$SampleType) assay(sce) <- mnorm ``` We are now faced with 2 possibilities: either we want to create a new assay or we want to overwrite an existing assay. In both cases we need to make sure your data is still valid after data transformation. ## Create a new assay Creating a new assay has the advantage that you don't modify an existing assay and hence limit the risk of introducing inconsistency in the data and avoid losing intermediate steps of the data processing. We add the transformed assay using the `addAssay` function, then link the parent assay to the transformed assay using `addAssayLinkOneToOne`. Note that if each row name in the parent assay does not match exactly one row in the child assay, you can also use `addAssayLink` that will require a linking variable in the `rowData`. ```{r} scp1 <- addAssay(scp1, sce, name = "proteinsNorm") scp1 <- addAssayLinkOneToOne(scp1, from = "proteins", to = "proteinsNorm") scp1 ``` ## Overwrite an existing assay Overwriting an existing assay has the advantage to limit the memory consumption as compared to adding a new assay. You can overwrite an assay simply by replacing the target assay in its corresponding slot. ```{r, eval = FALSE} scp1[["proteins"]] <- sce ``` ## Check for validity Applying custom changes to the data increases the risk for data inconsistencies. You can however verify that a `QFeatures` object is still valid after some processing thanks to the `validObject` function. ```{r} validObject(scp1) ``` If the function detects no issues in the data, it will return `TRUE`. Otherwise the function will throw an informative error that should guide the user to identifying the issue. # Modify the sample metadata To illustrate how to modify the sample metadata, we will compute the average expression in each sample and include to the `colData` of the `QFeatures` object. This is typically performed when computing QC metrics for sample filtering. So, we first extract the required data, in this case the quantitative values, and compute the sample-wise average protein expression. ```{r} m <- assay(scp1, "proteins") meanExprs <- colMeans(m, na.rm = TRUE) meanExprs ``` Next, we insert the computed averages into the `colData`. You need to make sure to match sample names because an extracted assay may not contain all samples and may be in a different order compared to the `colData`. ```{r} colData(scp1)[names(meanExprs), "meanProtExprs"] <- meanExprs ``` The new sample variable `meanProtExprs` is now accessible for filtering or plotting. The `$` operator makes it straightforward to access the new data. ```{r} hist(log2(scp1$meanProtExprs)) ``` To make sure that the process did not corrupt the `colData`, we advise to verify the data is still valid. ```{r} validObject(scp1) ``` # Modify the feature metadata We will again illustrate how to modify the feature metadata with an example. We here demonstrate how to add the number of samples in which each feature is detected for the three first assays (PSM assays). The challenge here is that the metric needs to be computed for each assay separately and inserted in the corresponding assay. We will take advantage of the replacement function for `rowData` as implemented in `QFeatures`. It expects a list-like object where names indicate in which assays we want to modify the `rowData` and each element contains a table with the replacement values. We therefore compute the metrics for each assay separately and store the results in a named `List`. ```{r} ## Initialize the List object that will store the computed values res <- List() ## We compute the metric for the first 3 assays for (i in 1:3) { ## We get the quantitative values for the current assay m <- assay(scp1[[i]]) ## We compute the number of samples in which each features is detected n <- rowSums(!is.na(m) & m != 0) ## We store the result as a DataFrame in the List res[[i]] <- DataFrame(nbSamples = n) } names(res) <- names(scp1)[1:3] res res[[1]] ``` Now that we have a `List` of `DataFrame`s, we can update the object. ```{r} rowData(scp1) <- res ``` The new feature variable `nbSamples` is now accessible for filtering or plotting. The `rbindRowData` function facilitates the access the new data. ```{r} library("ggplot2") rd <- rbindRowData(scp1, i = 1:3) ggplot(data.frame(rd)) + aes(x = nbSamples) + geom_histogram(bins = 16) + facet_wrap(~ assay) ``` To make sure that the process did not corrupt the `rowData` in any assay, we advise to verify the data is still valid. ```{r} validObject(scp1) ``` # Create a new function for `scp` The modifying data in a `QFeatures` involves a multiple-step process. Creating a wrapper function that would take care of those different steps in a single line of code is a good habit to reduce the length of analysis scripts and hence making it easier to understand and less error-prone. We will wrap the last example in a new function that we call `computeNbDetectedSamples`. ```{r} computeNbDetectedSamples <- function(object, i) { res <- List() for (ii in i) { m <- assay(object[[ii]]) n <- rowSums(!is.na(m) & m != 0) res[[ii]] <- DataFrame(nbSamples = n) } names(res) <- names(object)[i] rowData(object) <- res stopifnot(validObject(object)) object } ``` Thanks to this new function, the previous section now simply boils down to running: ```{r} scp1 <- computeNbDetectedSamples(scp1, i = 1:3) ``` Keep in mind a few recommendations when creating a new function for `scp`: - The function should take a `QFeatures` object as input (and more arguments if needed) and return a `QFeatures` object as output. This will make workflows much easier to understand. - Allow user to select assays (if required) either as numeric, character, or logical. - Use conventional argument names: when naming an argument, try to match the names that already exist. For instance, selecting assays is passed through the `i` argument, selecting `rowData` variables is passed through `rowvars` and selecting `colData` variables is passed through `colvars`. - Follow the `rformassspectrometry` [coding style](https://rformassspectrometry.github.io/RforMassSpectrometry/articles/RforMassSpectrometry.html#coding-style) # What's next? So you developed a new metric or method and believe it might benefit the community? We would love to hear about your improvements and eventually include your new functionality into `scp` or associate your new package to our documentation. Please, raise an [issue](https://github.com/UCLouvain-CBIO/scp/issues/new/choose) in our Github repository to suggest your improvements or, better, submit your code as a [pull request](https://github.com/UCLouvain-CBIO/scp/compare). # Session information {-} ```{r setup2, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "", crop = NULL ) ``` ```{r sessioninfo, echo=FALSE} sessionInfo() ``` # Reference