---
title: "multiWGCNA: visualizing condition-specific networks"
author: "Dario Tommasini"
date: "`r Sys.Date()`"
output: 
  BiocStyle::html_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{Astrocyte multiWGCNA network}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

In this vignette, we will be showing various ways users can analyze their condition-specific network across the conditions of their experiment. We will be using the astrocyte Ribotag data from Itoh et al. PNAS. 2018 (https://doi.org/10.1073/pnas.1716032115). This data comes with the multiWGCNAdata package hosted on ExperimentHub and can be accessed as shown below. 

# Load multiWGCNA library

```{r setup}
library(multiWGCNA)
```

# Load astrocyte Ribotag RNA-seq data 

```{r}
# Download data from the ExperimentHub
library(ExperimentHub)
eh = ExperimentHub()

# Note: this requires the SummarizedExperiment package to be installed
eh_query = query(eh, c("multiWGCNAdata"))
astrocyte_se = eh_query[["EH8223"]]

# Collect the metadata in the sampleTable; the first column must be named "Sample"
sampleTable = colData(astrocyte_se)

# Check the data
assays(astrocyte_se)[[1]][1:5, 1:5]
sampleTable

# Define our conditions for trait 1 (disease) and 2 (brain region)
conditions1 = unique(sampleTable[,2])
conditions2 = unique(sampleTable[,3])
```

# Network construction

We now perform network construction, module eigengene calculation, module-trait correlation. Let's use power = 12 since we used this in our paper (Tommasini and Fogel. BMC Bioinformatics. 2023.) for all the networks. 

```{r, eval = FALSE}
# Construct the combined networks and all the sub-networks (EAE, WT, and each region)
# Same parameters as Tommasini and Fogel. BMC Bioinformatics
astrocyte_networks = constructNetworks(astrocyte_se, sampleTable, conditions1, conditions2, 
                                  networkType = "signed", TOMType = "unsigned", 
                                  power = 12, minModuleSize = 100, maxBlockSize = 25000,
                                  reassignThreshold = 0, minKMEtoStay = 0, mergeCutHeight = 0,
                                  numericLabels = TRUE, pamRespectsDendro = FALSE, 
                                  deepSplit = 4, verbose = 3)

```

This step takes a while since it performs seven network constructions, so we also provide the WGCNA object list (astrocyte_networks) in a loadable format. These were generated from the function above. 

```{r}
# Load pre-computed astrocyte networks
astrocyte_networks = eh_query[["EH8222"]] 

# Check one of the WGCNA objects
astrocyte_networks[["combined"]]
```

# Compare modules by overlap 

Next, we compare modules (by hypergeometric overlap) across conditions. We'll save the results in a list. 

```{r, fig.height = 5, fig.width = 8}
# Save results to a list
results = list()
results$overlaps = iterate(astrocyte_networks, overlapComparisons, plot=FALSE)

# Check the overlaps, ie between the EAE and wildtype networks
head(results$overlaps$EAE_vs_WT$overlap)
```

# Identify a module of interest

Then, we perform differential module expression analysis to detect modules with disease-associated expression patterns. This incorporates the linear model described in the paper and tests for significance using ANOVA. 

```{r, fig.height = 6, fig.width = 7}
# Run differential module expression analysis (DME) on combined networks
results$diffModExp = runDME(astrocyte_networks[["combined"]], 
                            sampleTable,
                            p.adjust = "fdr", 
                            refCondition = "Region", 
                            testCondition = "Disease") 
                            # plot=TRUE, 
                            # out="ANOVA_DME.pdf")

# Check results sorted by disease association FDR
results$diffModExp[order(results$diffModExp$Disease),]

# You can check the expression of module M13 from Tommasini and Fogel. BMC Bioinformatics. 2023 like this. Note that the values reported in the bottom panel title are p-values and not adjusted for multiple comparisons like in results$diffModExp
diffModuleExpression(astrocyte_networks[["combined"]], 
                     geneList = topNGenes(astrocyte_networks[[1]], "combined_013"), 
                     design = sampleTable,
                     test = "ANOVA",
                     plotTitle = "combined_013",
                     plot = TRUE)

```

# Draw the multiWGCNA network

We can now check to see if M13 is present in any of the sub-networks. An easy way to do this is using the network-network correspondences from hypergeometric overlap. These are stored in results$overlaps. We can plot these in a convenient visualization scheme that also organizes the three levels of the multiWGCNA analysis: 1) combined network, 2) EAE and wildtype networks, and 3) the four regional networks. 

```{r, fig.height = 6, fig.width = 7}
drawMultiWGCNAnetwork(astrocyte_networks, 
                      results$overlaps, 
                      "combined_013", 
                      design = sampleTable, 
                      overlapCutoff = 0, 
                      padjCutoff = 1, 
                      removeOutliers = TRUE, 
                      alpha = 1e-50, 
                      layout = NULL, 
                      hjust = 0.4, 
                      vjust = 0.3, 
                      width = 0.5)
```

This corresponds to Figure 2C from Tommasini and Fogel. BMC Bioinformatics. 2023. We see that M13 is really only present in the EAE network, but not any of the other sub-networks. Most importantly, it cannot be resolved in the wildtype network. This makes M13 a biologically interesting network, both in terms of differential expression and differential co-expression. 

We can identify the EAE module that corresponds to M13 using the overlap analysis: 

```{r, fig.height = 8, fig.width = 10}
bidirectionalBestMatches(results$overlaps$combined_vs_EAE)
```

The colors correspond to -log10(FDR) derived from the hypergeometric test, while the numbers in the cells correspond to the number of genes overlapping. From this plot, we see that the module from the EAE network that corresponds to combined_013 is called EAE_015.

# Observe differential co-expression of top module genes

We can visually check that combined_013/EAE_015 genes are co-expressed in EAE and not co-expressed in WT samples. 

```{r, fig.height=5, fig.width=7}
# Get expression data for top 20 genes in EAE_015 module
datExpr = GetDatExpr(astrocyte_networks[[1]], 
                     genes = topNGenes(astrocyte_networks$EAE, "EAE_015", 20))

# Plot
coexpressionLineGraph(datExpr, splitBy = 1.5, fontSize = 2.5) + 
  geom_vline(xintercept = 20.5, linetype='dashed')
```

This corresponds to Figure 2D from Tommasini and Fogel. BMC Bioinformatics. 2023. Indeed, we see that these representative module members, which include many immune-related genes, co-vary in EAE samples but less so in WT samples. 

# Follow up with a preservation analysis

Typically, you would want to follow this up with a preservation analysis between EAE and WT (described in general_workflow.Rmd). This is slow so we don't actually run this in the vignette. 

```{r, eval = FALSE, fig.height = 3, fig.width = 7}
# To enable multi-threading
# library(doParallel)
# library(WGCNA)
# nCores = 2
# registerDoParallel(cores = nCores)
# enableWGCNAThreads(nThreads = nCores)

# Turn off multi-threading
# registerDoSEQ()
# disableWGCNAThreads()

# Calculate preservation statistics
results$preservation=iterate(astrocyte_networks[c("EAE", "WT")], 
                             preservationComparisons, 
                             write=FALSE, 
                             plot=TRUE, 
                             nPermutations=2)
```

# Determining if preservation value is significant

Then, one can perform a permutation procedure that estimates the probability of observing a disease (or wildtype) module with this preservation score in the wildtype (or disease) setting (PreservationPermutationTest). The test is designed to control for the other condition in the sampleTable. In this case, it will equally distribute the samples belonging to each anatomical region when testing preservation of this disease module in the wildtype samples. This is the slowest step! We recommend to let this run on a computing cluster overnight. 

```{r, eval = FALSE}
options(paged.print = FALSE)
results$permutation.test = PreservationPermutationTest(astrocyte_networks$combined@datExpr[sample(17000,3000),], 
                                                       sampleTable, 
                                                       constructNetworksIn = "EAE", # Construct networks using EAE samples
                                                       testPreservationIn = "WT", # Test preservation of disease samples in WT samples
                                                       nPermutations = 10, # Number of permutations for permutation test
                                                       nPresPermutations = 10, # Number of permutations for modulePreservation function
                                                       
                                                       # WGCNA parameters for re-sampled networks (should be the same as used for network construction)
                                                       networkType = "signed", TOMType = "unsigned", 
                                                       power = 12, minModuleSize = 100, maxBlockSize = 25000,
                                                       reassignThreshold = 0, minKMEtoStay = 0, mergeCutHeight = 0,
                                                       numericLabels = TRUE, pamRespectsDendro = FALSE, 
                                                       deepSplit = 4, verbose = 3
                                                       )
```

Because this step is slow, we provide pre-computed results within the package, stored as `permutationTestResults`. Since module preservation scores are dependent on the module size (larger modules have larger preservation scores), we generate the null distribution of preservation scores based on similarly sized modules from each permutation. Note that we have to filter out "outlier modules" first, as these are modules where the correlations are driven by a single sample. 

```{r, eval = TRUE}
# Load pre-computed results
data(permutationTestResults) 

# Remove outlier modules
permutationTestResultsFiltered = lapply(permutationTestResults, function(x) x[!x$is.outlier.module,])

# Extract the preservation score distribution
results$scores.summary = PreservationScoreDistribution(permutationTestResultsFiltered, 
                                                       moduleOfInterestSize = 303 # The size of the module of interest (dM15)
                                                       )

# Observed preservation score of dM15
observed.score = 9.16261490617938

# How many times did we observe a score lower than or equal to this observed score?
z.summary.dist = results$scores.summary$z.summary
below=length(z.summary.dist[z.summary.dist <= observed.score])
probability= below/100
message("Probability of observing a score of ", round(observed.score, 2), " is ", probability)
```

You can plot the observed score and the null distribution on a histogram like in Figure 2B of the manuscript. 

```{r, eval = TRUE}
# Plot on a histogram
ggplot(results$scores.summary, aes(x=z.summary)) + 
      geom_histogram(color="black", fill="white", bins = 15)+
      xlab("Preservation score")+
      ylab("Frequency")+
      geom_vline(xintercept=observed.score, color="red3", linetype="solid")+
      scale_y_continuous(expand = c(0,0))+
      theme_classic()+
      theme(plot.title = element_text(hjust = 0.5))
```

# Conclusion

This analysis provides a great deal of evidence suggesting that dM15 is a transcriptional network that is gained in this disease status and is not present in healthy astrocytes. Further investigation will reveal that this network corresponds to the reactive astrocyte transcriptional network, which indeed is active in the EAE model and is absent in healthy astrocytes. Please see Tommasini and Fogel. BMC Bioinformatics. 2023. for more details. 

```{r}
sessionInfo()
```