--- title: easier User Manual ![](easier_logo.png){#id .class width=60 height=60} author: - name: Oscar Lapuente-Santana affiliation: - &id Computational Biology group, Department of Biomedical Engineering, Eindhoven University of Technology (BME, TU/e) email: o.lapuente.santana@tue.nl - name: Federico Marini affiliation: - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI, Mainz) email: marinif@uni-mainz.de - name: Arsenij Ustjanzew affiliation: - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI, Mainz) email: arsenij.ustjanzew@uni-mainz.de - name: Francesca Finotello affiliation: - Institute of Bioinformatics, Biocenter Medical University of Innsbruck email: francesca.finotello@i-med.ac.at - name: Federica Eduati affiliation: - *id - Institute for Complex Molecular Systems, Eindhoven University of Technology (ICMS, TU/e) email: f.eduati@tue.nl date: "`r Sys.Date()`" output: html_document: toc: yes toc_float: yes number_sections: yes code_folding: show theme: lumen pdf_document: toc: yes number_sections: true bibliography: references_easier.bib vignette: > %\VignetteIndexEntry{easier User Manual} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include = FALSE} library("easier") ``` # Introduction {#introduction} Identification of biomarkers of immune response in the tumor microenvironment (TME) for prediction of patients’ response to immune checkpoint inhibitors is a major challenge in immuno-oncology. Tumors are complex systems, and understanding immune response in the TME requires holistic strategies. We introduce Estimate Systems Immune Response (EaSIeR) tool, an approach to derive a high-level representation of anti-tumor immune responses by leveraging widely accessible patients' tumor RNA-sequencing (RNA-seq) data. ## EaSIeR approach RNA-seq data is integrated with different types of biological prior knowledge to extract quantitative descriptors of the TME, including composition of the immune repertoire, and activity of intra- and extra-cellular communications (see table below). By performing this knowledge-guided dimensionality reduction, there is an improvement in the interpretability of the derived features. ```{r table1, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} library(knitr) table1 <- " | Quantitative descriptor | Descriptor conception | Prior knowledge | | ----------------------- | ----------------------------------- | ----------------------------------- | | Pathway activity | @HOLLAND2020194431; @Schubert2018 | @HOLLAND2020194431; @Schubert2018 | | Immune cell quantification | @Finotello2019 | @Finotello2019 | | TF activity | @Garcia-Alonso01082019 | @Garcia-Alonso01082019 | | Ligand-Receptor pairs | @LAPUENTESANTANA2021100293 | @Ramilowski2015; @Barretina2012 | | Cell-cell interaction | @LAPUENTESANTANA2021100293 | @Ramilowski2015; @Barretina2012 | Table: Quantitative descriptors of the TME " cat(table1) ``` Using the data from The Cancer Genome Atlas (TCGA) [@Chang2013], regularized multi-task linear regression was used to identify how the quantitative descriptors can simultaneously predict multiple hallmarks (i.e. published transcriptome-based predictors) of anti-cancer immune response (see table below). Here, the regularization is applied to select features that are relevant for all tasks. ```{r table2, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} table2 <- " | Hallmark of the immune response | Original study | |-------------------------------- | -------------- | | Cytolytic activity (CYT) | @ROONEY201548 | | Roh immune score (Roh_IS) | @Roheaah3560 | | Chemokine signature (chemokines) | @Messina2012 | | Davoli immune signature (Davoli_IS) | @Davolieaaf8399 | | IFNy signature (IFNy) | @Ayers2017 | | Expanded immune signature (Ayers_expIS) | @Ayers2017 | | T-cell inflamed signature (Tcell_inflamed) | @Ayers2017 | | Repressed immune resistance (RIR) | @JERBYARNON2018984 | | Tertiary lymphoid structures signature (TLS) | @Cabrita2020 | | Immuno-predictive score (IMPRES) | @Auslander2018 | | Microsatellite instability status | @Fu2019 | Table: Hallmarks of anti-cancer immune responses " cat(table2) ``` Cancer-specific models were learned and used to identify interpretable cancer-specific systems biomarkers of immune response. These models are available through `easierData` package and can be accessed using the `get_opt_models()` function. Model biomarkers have been experimentally validated in the literature and the performance of EaSIeR predictions has been validated using independent datasets from four different cancer types with patients treated with anti-PD1 or anti-PD-L1 therapy. For more detailed information, please refer to our original work: Lapuente-Santana et al. "Interpretable systems biomarkers predict response to immune-checkpoint inhibitors". Patterns, 2021 [doi:10.1016/j.patter.2021.100293](https://doi.org/10.1016/j.patter.2021.100293). ## EaSIeR tool This vignette describes how to use the easier package for a pleasant onboarding experience around the EaSIeR approach. By providing just the patients' bulk RNA-seq data as input, the EaSIeR tool allows you to: - Predict biomarker-based immune response - Identify system biomarkers of immune response
![](easier_image.png)
# Getting started {#gettingstarted} Starting R, this package can be installed as follows: ```{r, eval=FALSE} BiocManager::install("easier") # to install also the dependencies used in the vignettes and in the examples: BiocManager::install("easier", dependencies = TRUE) # to download and install the development version from GitHub, you can use BiocManager::install("olapuentesantana/easier") ``` Once installed, the package can be loaded and attached to your current workspace as follows: ```{r, eval=TRUE} library("easier") ``` In order to use `easier` in your workflow, bulk-tumor RNA-seq data is required as input (when available, patients' response to immunotherapy can be additionally provided): - `RNA_counts`, a `data.frame` containing raw counts values (with HGNC gene symbols as row names and samples identifiers as column names). - `RNA_tpm`, a `data.frame` containing TPM values (with HGNC gene symbols as row names and samples identifiers as column names). - `real_patient_response`, a character `vector` containing clinical patients' response to immunotherapy (with non-responders labeled as NR and responders as R). # Use case for `easier`: Bladder cancer patients [@Mariathasan2018] In this section, we illustrate the main features of `easier` on a publicly available bladder cancer dataset from Mariathasan et al. "TGF-B attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells", published in Nature, 2018 [doi:10.1038/nature25501](https://doi.org/10.1038/nature25501). The processed data is available via [`IMvigor210CoreBiologies`](http://research-pub.gene.com/IMvigor210CoreBiologies/) package under the CC-BY license. We will be using a subset of this data as exemplary dataset, made available via `easierData`. This includes RNA-seq data (count and TPM expression values), information on tumor mutational burden (TMB) and response to ICB therapy from 192 patients. 25 patients with complete response (CR) were classified as responders (R) and 167 patients with progressive disease (PD) as non-responders. In the following chunk, we load the additional packages that will be required throughout the vignette. ```{r, message=FALSE} suppressPackageStartupMessages({ library("easierData") library("SummarizedExperiment") }) ``` ## Load data from Mariathasan cohort For this example we will use gene expression data (counts and tpm values). Other variables include patient best overall response (BOR) to anti-PD-L1 therapy, tumor mutational burden (TMB) and the cancer type the cohort belongs to. ```{r, eval=TRUE} dataset_mariathasan <- get_Mariathasan2018_PDL1_treatment() # patient response patient_ICBresponse <- colData(dataset_mariathasan)[["BOR"]] names(patient_ICBresponse) <- colData(dataset_mariathasan)[["pat_id"]] # tumor mutational burden TMB <- colData(dataset_mariathasan)[["TMB"]] names(TMB) <- colData(dataset_mariathasan)[["pat_id"]] # cohort cancer type cancer_type <- metadata(dataset_mariathasan)[["cancertype"]] # gene expression data RNA_counts <- assays(dataset_mariathasan)[["counts"]] RNA_tpm <- assays(dataset_mariathasan)[["tpm"]] # Select a subset of patients to reduce vignette building time. set.seed(1234) pat_subset <- sample(names(patient_ICBresponse), size = 30) patient_ICBresponse <- patient_ICBresponse[pat_subset] TMB <- TMB[pat_subset] RNA_counts <- RNA_counts[, pat_subset] RNA_tpm <- RNA_tpm[, pat_subset] # Some genes are causing issues due to approved symbols matching more than one gene genes_info <- easier:::reannotate_genes(cur_genes = rownames(RNA_tpm)) ## Remove non-approved symbols non_na <- !is.na(genes_info$new_names) RNA_tpm <- RNA_tpm[non_na, ] genes_info <- genes_info[non_na, ] ## Remove entries that are withdrawn RNA_tpm <- RNA_tpm[-which(genes_info$new_names == "entry withdrawn"), ] genes_info <- genes_info[-which(genes_info$new_names == "entry withdrawn"), ] ## Identify duplicated new genes newnames_dup <- unique(genes_info$new_names[duplicated(genes_info$new_names)]) newnames_dup_ind <- do.call(c, lapply(newnames_dup, function(X) which(genes_info$new_names == X))) newnames_dup <- genes_info$new_names[newnames_dup_ind] ## Retrieve data for duplicated genes tmp <- RNA_tpm[genes_info$old_names[genes_info$new_names %in% newnames_dup],] ## Remove data for duplicated genes RNA_tpm <- RNA_tpm[-which(rownames(RNA_tpm) %in% rownames(tmp)),] ## Aggregate data of duplicated genes dup_genes <- genes_info$new_names[which(genes_info$new_names %in% newnames_dup)] names(dup_genes) <- rownames(tmp) if (anyDuplicated(newnames_dup)){ tmp2 <- stats::aggregate(tmp, by = list(dup_genes), FUN = "mean") rownames(tmp2) <- tmp2$Group.1 tmp2$Group.1 <- NULL } # Put data together RNA_tpm <- rbind(RNA_tpm, tmp2) ``` ## Compute hallmarks of immune response Multiple hallmarks (i.e. published transcriptome-based predictors) of the immune response can also be computed using TPM data from RNA-seq. By default, the following scores of the immune response will be computed: cytolytic activity (CYT) [@ROONEY201548], Roh immune score (Roh_IS) [@Roheaah3560], chemokine signature (chemokines) [@Messina2012], Davoli immune signature (Davoli_IS) [@Davolieaaf8399], IFNy signature (IFNy) [@Ayers2017], expanded immune signature (Ayers_expIS) [@Ayers2017], T-cell inflamed signature (Tcell_inflamed) [@Ayers2017], immune resistance program (RIR: resF_down, resF_up, resF) [@JERBYARNON2018984] and tertiary lymphoid structures signature (TLS) [@Cabrita2020]. This selection can be customized by editing the `selected_scores` option. ```{r, eval=TRUE} hallmarks_of_immune_response <- c("CYT", "Roh_IS", "chemokines", "Davoli_IS", "IFNy", "Ayers_expIS", "Tcell_inflamed", "RIR", "TLS") immune_response_scores <- compute_scores_immune_response(RNA_tpm = RNA_tpm, selected_scores = hallmarks_of_immune_response) head(immune_response_scores) ``` ## Compute quantitative descriptors of the TME We are going to use the bulk RNA-seq data to derive, for each patient, the five quantitative descriptors of the TME described above. By applying quanTIseq [@Finotello2019] method to TPM data from RNA-seq, the quantification of different cell fractions can be done as follows: ```{r, eval=TRUE} cell_fractions <- compute_cell_fractions(RNA_tpm = RNA_tpm) head(cell_fractions) ``` By applying PROGENy [@HOLLAND2020194431; @Schubert2018] method to count data from RNA-seq, the activity of 14 signaling pathways can be inferred as in the chunk below. ```{r, eval=TRUE} pathway_activities <- compute_pathway_activity(RNA_counts = RNA_counts, remove_sig_genes_immune_response = TRUE) head(pathway_activities) ``` The call above infers pathway activity as a linear transformation of gene expression data. Since some pathway signature genes were also used to compute scores of immune response (output variable in EaSIeR model). With `remove_sig_genes_immune_response` set to `TRUE` (default), the overlapping genes are removed from the pathway signature genes used to infer pathway activities. By applying DoRothEA [@Garcia-Alonso01082019] method to TPM data from RNA-seq, the activity of 118 TFs can be inferred as follows: ```{r, eval=TRUE} tf_activities <- compute_TF_activity(RNA_tpm = RNA_tpm) head(tf_activities[,1:5]) ``` Using derived cancer-specific inter-cellular networks, the quantification of 867 ligand-receptor pairs can be done as in the chunk below. More detailed information on how these networks were obtained can be found in the experimental procedures section from our original work [@LAPUENTESANTANA2021100293]. ```{r, eval=TRUE} lrpair_weights <- compute_LR_pairs(RNA_tpm = RNA_tpm, cancer_type = "pancan") head(lrpair_weights[,1:5]) ``` Via `cancer_type`, a cancer-specific ligand-receptor pairs network can be chosen. With `cancer_type` set to `pancan`, a pan-cancer network will be used and this is based on the union of all ligand-receptor pairs present across the 18 cancer-specific networks. Using the ligand-receptor weights as input, 169 cell-cell interaction scores can be derived as in the chunk below. ```{r, eval=TRUE} ccpair_scores <- compute_CC_pairs(lrpairs = lrpair_weights, cancer_type = "pancan") head(ccpair_scores[,1:5]) ``` Again, `cancer_type` is set to `pancan` (default). The same `cancer_type` network used to quantify ligand-receptor pairs should be designated here. ## Obtain patients' predictions of immune response Now we use the quantitative descriptors computed previously as input features to predict anti-tumor immune responses based on the model parameters defined during training. The output of `predict_immune_response` returns predictions of patients' immune response for each quantitative descriptor. Because models were built in a cancer-type-specific fashion, the user is required to indicate which cancer-specific model should be used for predicting patients' immune response, which can be done via the `cancer_type` argument. The `cancer_type` provided should match one of the cancer-specific models available. These are the following: bladder urothelial carcinoma (`BLCA`), brest invasive carcinoma (`BRCA`), cervical and endocervical cancer (`CESC`), colorectal adenocarcinoma (`CRC`), glioblastoma multiforme (`GBM`), head and neck squamous cell carcinoma (`HNSC`), kidney renal clear cell carcinoma (`KIRC`), kidney renal papillary cell carcinoma (`KIRP`), liver hepatocellular carcinoma (`LIHC`), lung adenocarcinoma (`LUAD`), lung squamous cell carcinoma (`LUSC`), non-small-cell lung carcinoma (`NSCLC` [`LUAD` + `LUSC`]), ovarian serous cystadenocarcionma (`OV`), pancreatic adenocarcionma (`PAAD`), prostate adenocarcinoma (`PRAD`), skin cutaneous melanoma (`SKCM`), stomach adenocarcinoma (`STAD`), thyroid carcinoma (`THCA`), uterine corpus endometrial carcinoma (`UCEC`). The optimized models can be retrieved from `easierData` package using the function `get_opt_models()`. ```{r, eval=TRUE} predictions <- predict_immune_response(pathways = pathway_activities, immunecells = cell_fractions, tfs = tf_activities, lrpairs = lrpair_weights, ccpairs = ccpair_scores, cancer_type = cancer_type, verbose = TRUE) ``` Once we obtained patients' predicted immune response, two different scenarios should be considered in which: -`patient_response` is **known** and therefore the accuracy of `easier` predictions can be evaluated. -`patient_response` is **unknown** and no assessments can be carried out. In this use case, patients' clinical response to PD-L1 therapy is available from Mariathasan cohort, thus we can move forward to assess the performance of the predictions. ## Evaluate easier predictions using patients' immunotherapy response Assessment of patients' response to immunotherapy treatment can be done via `assess_immune_response` function, which informs about the accuracy of easier predictions on patients' response to ICB therapy. The option `patient_response` should be provided with a character string containing patients' clinical response (where non-responders are labeled as NR and responders as R). Importantly, this aspect should be handled by the user. This function uses patients' TPM values (`RNA_tpm`) as input for `compute_scores_immune_response` in order to compare easier predictions with those from published scores of immune response. The user can choose the scores to be computed via `select_gold_standard`, by default all scores are computed. If patients' tumor mutational burden (TMB) is available, this can also be provided via `TMB_values` and used as surrogate of patients' immunotherapy response for comparison. Since both immune response and TMB are essential for an effective immunotherapy response, we decided to conceptualize this in our predictions by either penalizing or weighting differently our scores in high- and low-TMB patients. If `easier_with_TMB` is set to `weighted_average`, a weighted average of both easier and TMB score will be used, if instead `easier_with_TMB` is set to `penalized_score`, patient's easier scores will penalized depending on their TMB category. These two strategies to combine immune response and TMB require the definition of a certain weight or penalty beforehand (`weight_penalty`). The default weight or penalty is 0.5. ```{r, eval=TRUE} output_eval_with_resp <- assess_immune_response(predictions_immune_response = predictions, patient_response = patient_ICBresponse, RNA_tpm = RNA_tpm, TMB_values = TMB, easier_with_TMB = "weighted_average", weight_penalty = 0.5) ``` Top figure. Area under the curve (AUC) values of patients' predictions based on quantitative descriptors of the TME, an ensemble descriptor based on average of individual descriptors, and the computed scores of immune response (gold standard). Bar plot represent the average AUC across tasks and error bars describe the corresponding standard deviation. Bottom figure. ROC curves were computed as the average of the ROC curves obtained for each score of immune response. `output_evaluation_with_resp` stores the plots from above, and they can be inspected in R as follows: ```{r, eval=FALSE} # inspect output output_eval_with_resp[[1]] output_eval_with_resp[[2]] output_eval_with_resp[[3]] ``` ## What if I have an immunotherapy dataset where patients' response is not available? This is a usual case where we might have a cancer dataset with bulk RNA-seq data but lack information about patients' response to immunotherapy. In this likely scenario, an score of likelihood of immune response can be assigned to each patient by omitting the argument `patient_response` within the function `assess_immune_response`. ```{r, eval=TRUE} output_eval_no_resp <- assess_immune_response(predictions_immune_response = predictions, TMB_values = TMB, easier_with_TMB = "weighted_average", weight_penalty = 0.5) ``` Top figure. Boxplot of patients' easier score showing its distribution across the 10 different tasks. Bottom figure. Scatterplot of patients' prediction when combining easier score with tumor mutational burden showing its distribution across the 10 different tasks. `output_evaluation_no_resp` stores the plots from above, and they can be inspected in R as follows: ```{r, eval=FALSE} # inspect output output_eval_no_resp[[1]] output_eval_no_resp[[2]] ``` ### Retrieve easier scores of immune response We can further retrieve the easier score and also, the refined scores obtained by integrating easier score and TMB via `retrieve_easier_score`. ```{r, eval=TRUE} easier_derived_scores <- retrieve_easier_score(predictions_immune_response = predictions, TMB_values = TMB, easier_with_TMB = c("weighted_average", "penalized_score"), weight_penalty = 0.5) head(easier_derived_scores) ``` ## Interpret response to immunotherapy through systems biomarkers {#biomarkers} Identifying mechanisms used by patients' tumors to resist or succumb to ICB treatment is of paramount importance. Via `explore_biomarkers`, we can visualize stunning biomarkers of immune response and shed light into possible mechanisms responsible for the patients' response to treatment. The option `patient_label` allows to make a two-level comparison by providing a character string containing the label of each patient, for instance, patients' clinical response (where non-responders are labeled as NR and responders as R). In order to leverage the cancer-specific biomarker weights inferred from model training, you need to specify again which `cancer_type` the bulk RNA-seq data belongs to. As before, the selected `cancer_type` should be included in the list described above. ```{r, eval=TRUE} output_biomarkers <- explore_biomarkers(pathways = pathway_activities, immunecells = cell_fractions, tfs = tf_activities, lrpairs = lrpair_weights, ccpairs = ccpair_scores, cancer_type = cancer_type, patient_label = patient_ICBresponse) ``` The output of `explore_biomarkers` returns, for each quantitative descriptor, feature's z-score values comparing responders (R) and non-responders (NR) patients with the corresponding feature's model weights (only top 15 biomarkers are shown for tfs, lrpairs and ccpairs descriptors). Additionally, a volcano plot integrating systems biomarkers from all quantitative descriptors comparing NR and R patients (two-sided Wilcoxon rank-sum test). Significant biomarkers (p < 0.05) are shown in blue. Biomarkers are drawn according to their corresponding sign (shape) and weight (size) obtained during model training. Labels are reported for the top 15 biomarkers (based on the association with the tasks) that are significantly different between R and NR. `output_biomarkers` stores the plots from above, and they can be inspected in R as follows: ```{r, eval=FALSE} # inspect output output_biomarkers[[1]] output_biomarkers[[2]] output_biomarkers[[3]] output_biomarkers[[4]] output_biomarkers[[5]] output_biomarkers[[6]] ``` # Session info {- .smaller} ```{r sessioninfo} sessionInfo() ``` # References {-}