--- title: "Molecular subtyping for ovarian cancer" author: "Gregory M Chen" date: "`r Sys.Date()`" output: html_document: default pdf_document: default bibliography: bibliography.bib vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Molecular subtyping for ovarian cancer} %\usepackage[UTF-8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, cache=TRUE) ``` ## Introduction `consensusOV` is a package for molecular subtyping for ovarian cancer. It is intended for whole-transcriptome gene expression datasets from patients with high-grade serous ovarian carcinoma. This package includes implementations of four previously published subtype classifiers [@konecny2014prognostic; @verhaak2013prognostically; @bentink2012angiogenic; @helland2011deregulation] and a consensus random forest classifier. The `get.subtypes()` function is a wrapper for the other package subtyping functions `get.consensus.subtypes()`, `get.konecny.subtypes()`, `get.verhaak.subtypes()`, `get.bentink.subtypes()`, `get.helland.subtypes()`. It can take as input either a matrix of gene expression values and a vector of Entrez IDs; or a BioBase::ExpressionSet following the format of MetaGxOvarian [@gendoo2016metagxdata]. If `expression.dataset` is a matrix, it should be formatted with genes as rows and patients as columns; and `entrez.ids` should be a vector with length the same as `nrow(expression.dataset)`. The `method` argument specifies which of the five subtype classifiers to use. ## Load Data ```{r load_pkgs, message=FALSE} library(consensusOV) library(Biobase) library(genefu) ``` The package contains a subset of the ovarian cancer microarray dataset `GSE14764` as example data. ```{r load_data, message=FALSE} data(GSE14764.eset) dim(GSE14764.eset) GSE14764.expression.matrix <- exprs(GSE14764.eset) GSE14764.expression.matrix[1:5,1:5] GSE14764.entrez.ids <- fData(GSE14764.eset)$EntrezGene.ID head(GSE14764.entrez.ids) ``` ## Subtyping ```{r subtyping} bentink.subtypes <- get.subtypes(GSE14764.eset, method = "Bentink") bentink.subtypes$Bentink.subtypes konecny.subtypes <- get.subtypes(GSE14764.eset, method = "Konecny") konecny.subtypes$Konecny.subtypes helland.subtypes <- get.subtypes(GSE14764.eset, method = "Helland") helland.subtypes$Helland.subtypes ``` ```{r subtyping2, results="hide"} verhaak.subtypes <- get.subtypes(GSE14764.eset, method = "Verhaak") ``` ```{r subtyping3} verhaak.subtypes$Verhaak.subtypes consensus.subtypes <- get.subtypes(GSE14764.eset, method = "consensusOV") consensus.subtypes$consensusOV.subtypes ``` ```{r alternative_subtyping} ## Alternatively, e.g. bentink.subtypes <- get.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids, method = "Bentink") ``` Each subtyping function outputs a list with two values. The first value is a factor of subtype labels. The second is an classifier-specific values. For the Konecny, Helland, Verhaak, and Consensus classifiers, this object is a dataframe with subtype specific scores. For the Bentink classifier, the object is the output of the `genefu` function call. Subtype classifiers can alternatively be called using inner functions. ```{r subtyping_ex_2} bentink.subtypes <- get.bentink.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids) ``` ## Subtype Scores The Konecny, Helland, Verhaak, and Consensus classifiers produce real-valued subtype scores. We can use these in various ways - for example, here, we compute correlations between correspinding subtype scores. We can compare the subtype scores between the Verhaak and Helland classifiers: ```{r verhaak_helland, fig.height = 8, fig.width = 8} vest <- verhaak.subtypes$gsva.out vest <- vest[,c("IMR", "DIF", "PRO", "MES")] hest <- helland.subtypes$subtype.scores hest <- hest[, c("C2", "C4", "C5", "C1")] dat <- data.frame( as.vector(vest), rep(colnames(vest), each=nrow(vest)), as.vector(hest), rep(colnames(hest), each=nrow(hest))) colnames(dat) <- c("Verhaak", "vsc", "Helland", "hsc") ## plot library(ggplot2) ggplot(dat, aes(Verhaak, Helland)) + geom_point() + facet_wrap(vsc~hsc, nrow = 2, ncol = 2) ``` Corresponding correlation values are `r round(cor(verhaak.subtypes$gsva.out[,"DIF"], helland.subtypes$subtype.scores[,"C4"]), digits=2)`, `r round(cor(verhaak.subtypes$gsva.out[,"IMR"], helland.subtypes$subtype.scores[,"C2"]), digits=2)`, `r round(cor(verhaak.subtypes$gsva.out[,"MES"], helland.subtypes$subtype.scores[,"C1"]), digits=2)`, and `r round(cor(verhaak.subtypes$gsva.out[,"PRO"], helland.subtypes$subtype.scores[,"C5"]), digits=2)`. Likewise, we can compare the subtype scores between the Konecny and Helland classifier: ```{r konecny_helland, fig.height = 8, fig.width = 8} kost <- konecny.subtypes$spearman.cc.vals hest <- helland.subtypes$subtype.scores hest <- hest[, c("C2", "C4", "C5", "C1")] dat <- data.frame( as.vector(kost), rep(colnames(kost), each=nrow(kost)), as.vector(hest), rep(colnames(hest), each=nrow(hest))) colnames(dat) <- c("Konecny", "ksc", "Helland", "hsc") ## plot ggplot(dat, aes(Konecny, Helland)) + geom_point() + facet_wrap(ksc~hsc, nrow = 2, ncol = 2) ``` Corresponding correlation values are `r round(cor(konecny.subtypes$spearman.cc.vals[,"C1_immL"], helland.subtypes$subtype.scores[,"C2"]), digits=2)`, `r round(cor(konecny.subtypes$spearman.cc.vals[,"C2_diffL"], helland.subtypes$subtype.scores[,"C4"]), digits=2)`, `r round(cor(konecny.subtypes$spearman.cc.vals[,"C3_profL"], helland.subtypes$subtype.scores[,"C5"]), digits=2)`, and `r round(cor(konecny.subtypes$spearman.cc.vals[,"C4_mescL"], helland.subtypes$subtype.scores[,"C1"]), digits=2)`. ## References