---
title: "Molecular subtyping for ovarian cancer"
author: "Gregory M Chen"
package: consensusOV
output:
    BiocStyle::html_document:
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Molecular subtyping for ovarian cancer}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
```

# Introduction

`r Biocpkg("consensusOV")` is a package for molecular subtyping for ovarian cancer. It is intended for whole-transcriptome gene expression datasets from patients with high-grade serous ovarian carcinoma. This package includes implementations of four previously published subtype classifiers 
([Helland et al., 2011](https://doi.org/10.1371/journal.pone.0018064);
[Bentink et al., 2012](https://doi.org/10.1371/journal.pone.0030269);
[Verhaak et al., 2013](https://doi.org/10.1172/JCI65833);
[Konecny et al., 2014](https://doi.org/10.1093/jnci/dju249))
and a consensus random forest classifier ([Chen et al., 2018](https://doi.org/10.1158/1078-0432.CCR-18-0784)).

The `get.subtypes()` function is a wrapper for the other package subtyping functions  `get.consensus.subtypes()`, `get.konecny.subtypes()`, `get.verhaak.subtypes()`,
`get.bentink.subtypes()`, `get.helland.subtypes()`. It can take as input either a matrix of gene expression values and a vector of Entrez IDs; or an `ExpressionSet` from the `r Biocpkg("Biobase")` package. If `expression.dataset` is a matrix, it should be formatted with genes as rows and patients as columns; and `entrez.ids` should be a vector with length the same as `nrow(expression.dataset)`. The `method` argument specifies which of the five subtype classifiers to use.

# Load Data
```{r load_pkgs, message=FALSE}
library(consensusOV)
library(Biobase)
library(genefu)
```

The package contains a subset of the ovarian cancer microarray dataset
[GSE14764](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE14764) as example data.
```{r load_data, message=FALSE}
data(GSE14764.eset)
dim(GSE14764.eset)
GSE14764.expression.matrix <- exprs(GSE14764.eset)
GSE14764.expression.matrix[1:5,1:5]
GSE14764.entrez.ids <- fData(GSE14764.eset)$EntrezGene.ID
head(GSE14764.entrez.ids)
```

# Subtyping

```{r subtyping}
bentink.subtypes <- get.subtypes(GSE14764.eset, method = "Bentink")
bentink.subtypes$Bentink.subtypes
konecny.subtypes <- get.subtypes(GSE14764.eset, method = "Konecny")
konecny.subtypes$Konecny.subtypes
helland.subtypes <- get.subtypes(GSE14764.eset, method = "Helland")
helland.subtypes$Helland.subtypes
```

```{r subtyping2, results="hide"}

# to align with the Verhaak subtypes, we need to remove the "geneid." in the rownames
temp_eset <- GSE14764.eset
rownames(temp_eset) <- gsub("geneid.", "", rownames(temp_eset))

verhaak.subtypes <- get.subtypes(temp_eset, method = "Verhaak")
```

```{r subtyping3}
verhaak.subtypes$Verhaak.subtypes
consensus.subtypes <- get.subtypes(GSE14764.eset, method = "consensusOV")
consensus.subtypes$consensusOV.subtypes
```

```{r alternative_subtyping}
## Alternatively, e.g.
data(sigOvcAngiogenic)
bentink.subtypes <- get.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids, method = "Bentink")
```

Each subtyping function outputs a list with two values. The first value is a factor of subtype labels. The second is an classifier-specific values. For the Konecny, Helland, Verhaak, and Consensus classifiers, this object is a dataframe with subtype specific scores. For the Bentink classifier, the object is the output of the `genefu` function call. 

Subtype classifiers can alternatively be called using inner functions.

```{r subtyping_ex_2}
bentink.subtypes <- get.bentink.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids)
```

# Subtype Scores

The Konecny, Helland, Verhaak, and Consensus classifiers produce real-valued subtype scores. We can use these in various ways - for example, here, we compute correlations between correspinding subtype scores.

We can compare the subtype scores between the Verhaak and Helland classifiers:

```{r verhaak_helland, fig.height = 8, fig.width = 8}
vest <- verhaak.subtypes$gsva.out
vest <- vest[,c("IMR", "DIF", "PRO", "MES")]
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
	as.vector(vest), 
	rep(colnames(vest), each=nrow(vest)),
	as.vector(hest), 
	rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Verhaak", "vsc", "Helland", "hsc")
## plot
library(ggplot2)
ggplot(dat, aes(Verhaak, Helland)) + geom_point() + facet_wrap(vsc~hsc, nrow = 2, ncol = 2)
```

Corresponding correlation values are 
`r round(cor(verhaak.subtypes$gsva.out[,"DIF"], helland.subtypes$subtype.scores[,"C4"]), digits=2)`, 
`r round(cor(verhaak.subtypes$gsva.out[,"IMR"], helland.subtypes$subtype.scores[,"C2"]), digits=2)`,
`r round(cor(verhaak.subtypes$gsva.out[,"MES"], helland.subtypes$subtype.scores[,"C1"]), digits=2)`, and 
`r round(cor(verhaak.subtypes$gsva.out[,"PRO"], helland.subtypes$subtype.scores[,"C5"]), digits=2)`.

Likewise, we can compare the subtype scores between the Konecny and Helland classifier:

```{r konecny_helland, fig.height = 8, fig.width = 8}
kost <- konecny.subtypes$spearman.cc.vals
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
	as.vector(kost), 
	rep(colnames(kost), each=nrow(kost)),
	as.vector(hest), 
	rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Konecny", "ksc", "Helland", "hsc")
## plot
ggplot(dat, aes(Konecny, Helland)) + geom_point() + facet_wrap(ksc~hsc, nrow = 2, ncol = 2)
```

Corresponding correlation values are 
`r round(cor(konecny.subtypes$spearman.cc.vals[,"C1_immL"], helland.subtypes$subtype.scores[,"C2"]), digits=2)`, 
`r round(cor(konecny.subtypes$spearman.cc.vals[,"C2_diffL"], helland.subtypes$subtype.scores[,"C4"]), digits=2)`, 
`r round(cor(konecny.subtypes$spearman.cc.vals[,"C3_profL"], helland.subtypes$subtype.scores[,"C5"]), digits=2)`, 
and `r round(cor(konecny.subtypes$spearman.cc.vals[,"C4_mescL"], helland.subtypes$subtype.scores[,"C1"]), digits=2)`.