--- title: "ExpressionAtlas package vignette" author: "Maria Keays, Pedro Madrigal, Anil Thanki" affiliation: "EMBL-EBI" date: "`r Sys.Date()`" output: html_document: toc: true toc_depth: 3 number_sections: true bibliography: references.bib keywords: ["gene expression", "RNA-seq", "ExpressionAtlas", "Bioconductor"] vignette: > %\VignetteEngine{ knitr::rmarkdown } %\VignetteBuilder{ knitr } %\VignetteIndexEntry{ ExpressionAtlas } %\VignetteEncoding{UTF-8} --- # Expression Atlas resources at EMBL-EBI The [EMBL-EBI](http://www.ebi.ac.uk) [Expression Atlas](http://www.ebi.ac.uk/gxa) and [Single Cell Expression Atlas](http://www.ebi.ac.uk/gxa/sc) resources consist of carefully selected high-quality datasets from [ArrayExpress](https://www.ebi.ac.uk/biostudies/arrayexpress) and other data sources that have been manually curated and re-analyzed via the Functional Genomics Team analysis pipelines [@George2023]. Since October 2022, ArrayExpress is a collection of functional genomics data in BioStudies (https://www.ebi.ac.uk/biostudies). The Expression Atlas website allows users to search these datasets for genes and/or experimental conditions, to discover which genes are expressed in which tissues, cell types, developmental stages, and hundreds of other experimental conditions. The *ExpressionAtlas* R package allows you to search and download pre-packaged data from Expression Atlas inside an R session. Raw counts are provided for RNA-seq datasets, while normalized intensities are available for microarray experiments. Protocols describing how the data was generated are contained within the downloaded R objects, with more detailed information available on the [Expression Atlas website](http://www.ebi.ac.uk/gxa). Sample annotations are also included in the R object. Single-cell datasets are downloaded in annData format, and loaded into a SingleCellExperiment object for access and customised visualisation. # Searching and downloading Expression Atlas data The package communicates with the EBI RESTful Web Services API and Expression Atlas API to enable efficient data searching and retrieval. ## Searching You can search for experiments in Atlas using the `searchAtlasExperiments()` function. This function returns a *DataFrame* (see [S4Vectors](http://bioconductor.org/packages/release/bioc/html/S4Vectors.html)) containing the results of your search. The first argument to `searchAtlasExperiments()` should be a character vector of sample properties, e.g. biological sample attributes, experimental treatments nad/or species. You may also optionally provide a secondary filter yo your query to limit your search to, as a second argument. ```{r setup, include=FALSE} knitr::opts_chunk$set(message=FALSE) knitr::opts_chunk$set(warning=FALSE) ``` ```{r} invisible(lapply( c("ExpressionAtlas", "SummarizedExperiment", "SingleCellExperiment", "ggplot2"), function(pkg) suppressMessages(library(pkg, character.only = TRUE)) )) ``` ```{r eval=FALSE} atlas_res <- searchAtlasExperiments( query = "salt", secondaryFilter = "oryza" ) # Searching for experiments matching your query ... # Query successful. # Found 4 experiments matching your query. ``` ```{r, echo=FALSE} data( "atlasRes" ) ``` We will proceed with a subset of three accessions: ```{r} atlasRes ``` The *Accession* column contains the ArrayExpress accession of each dataset -- the unique identifier assigned to it. The species, experiment type (e.g. microarray or RNA-seq), and title of each dataset are also listed. ## Detailed search through experiment metadata ```{r} searchAtlasExperiments( query = "lung" , detailed = FALSE ) ``` The `detailed` argument can be set to `TRUE` to return a more detailed search through Atlas experiment metadata information: ```{r} searchAtlasExperiments( query = "lung" , detailed = TRUE ) ``` It can work with a secondary filter as well: ```{r} searchAtlasExperiments( query = "lung", secondaryFilter = "human", detailed = FALSE ) searchAtlasExperiments( query = "lung", secondaryFilter = "human", detailed = TRUE ) ``` As we can see, a detailed search returns a *DataFrame* with more Atlas studies. ## Downloading the data To download the data for any/all of the experiments in your results, you can use the function `getAtlasData()`. This function accepts a vector of ArrayExpress accessions. The data is downloaded into a *SimpleList* object (see package [S4Vectors](http://bioconductor.org/packages/release/bioc/html/S4Vectors.html)), with one entry per experiment, listed by accession. For example, to download all the datasets in your results: ```{r eval=FALSE} allExps <- getAtlasData( atlasRes$Accession ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-GEOD-11175/E-GEOD-11175-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-GEOD-11175 # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1625 # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1624/E-MTAB-1624-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1624 ``` ```{r, echo=FALSE} data( "allExps" ) ``` ```{r} allExps ``` To only download the RNA-seq experiment(s): ```{r eval=FALSE} rnaseqExps <- getAtlasData( atlasRes$Accession[ grep( "rna-seq", atlasRes$Type, ignore.case = TRUE ) ] ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1625 ``` ```{r, echo=FALSE} data( "rnaseqExps" ) ``` ```{r} rnaseqExps ``` To access an experiment summary, use the accession: ```{r} mtab1624 <- allExps[[ "E-MTAB-1624" ]] mtab1625 <- allExps[[ "E-MTAB-1625" ]] ``` Each dataset is also represented by a *SimpleList*, with one entry per platform used in the experiment. For RNA-seq data there will only ever be one entry, named `rnaseq`. For microarray data, there is one entry per array design used, listed by ArrayExpress array design accession (see below). ### RNA-seq experiment summaries Following on from above, `mtab1625` now contains a *SimpleList* object with a single entry named `rnaseq`. For RNA-seq experiments, this entry is a *RangedSummarizedExperiment* object (see package [SummarizedExperiment](http://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html)). ```{r} sumexp <- mtab1625$rnaseq sumexp ``` The matrix of raw counts for this experiment is stored in the *assays* slot: ```{r} head( assays( sumexp )$counts ) ``` The sample annotations can be found in the *colData* slot: ```{r} colData( sumexp ) ``` Information describing how the raw data files were processed to obtain the raw counts matrix are found in the *metadata* slot: ```{r} metadata( sumexp ) ``` ### Single-channel microarray experiments Data from a single-channel microarray experiment, e.g. [E-MTAB-1624](http://www.ebi.ac.uk/gxa/experiments/E-MTAB-1624), is represented as one or more *[ExpressionSet](https://www.bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf)* object(s) in the SimpleList that is downloaded. *ExpressionSet* objects are indexed by the ArrayExpress accession(s) of the microarray design(s) used in the original experiment. ```{r} names( mtab1624 ) affy126data <- mtab1624[[ "A-AFFY-126" ]] affy126data ``` The matrix of normalized intensity values is in the *assayData* slot: ```{r} head( exprs( affy126data ) ) ``` The sample annotations are in the *phenoData* slot: ```{r} pData( affy126data ) ``` A brief outline of how the raw data was normalized is in the *experimentData* slot: ```{r} preproc( experimentData( affy126data ) ) ``` # Downloading and visualising a bulk Expression Atlas experiment ## Downloading a single Expression Atlas experiment summary You can download data for a single Expression Atlas experiment using the `getAtlasExperiment()` function: ```{r eval=FALSE} mtab3007 <- getAtlasExperiment( "E-MTAB-3007" ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-3007/E-MTAB-3007-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-3007 ``` ## Downloading a single Expression Atlas experiment normalised data You can download normalised data for a single Expression Atlas experiment using the `getNormalisedAtlasExpression()` function: ```{r eval=FALSE} mtab4045_tpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "tpm" ) # Downloading XML file from FTP... # E-MTAB-4045 is rnaseq_mrna_baseline , will continue downloading data # Downloading expression file from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-4045/E-MTAB-4045-tpms.tsv # Downloading XML file from FTP... ``` ```{r eval=FALSE} mtab4045_cpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "cpm" ) # Downloading XML file from FTP... # E-MTAB-4045 is rnaseq_mrna_baseline , will continue downloading data # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-4045/E-MTAB-4045-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-4045 ``` ## Downloading a single Expression Atlas differential experiment analytics data You can also download analytics data for a single Expression Atlas deifferential experiment using the `getAnalyticsDifferentialAtlasExpression()` function: ```{r eval=FALSE} mtab10104_dea <- getAnalyticsDifferentialAtlasExpression( "E-MTAB-10104" ) # Downloading expression file from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-10104/E-MTAB-10104-analytics.tsv ``` ## Creating heatmap from normalised data of a Expression Atlas experiment You can create heatmap from downloaded normalised data for a single Expression Atlas experiment using the `getNormalisedAtlasExpression()` function: ```{r eval=TRUE, fig.width=7, fig.height=8} mtab4045_tpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "tpm" ) mtab4045 <- heatmapAtlasExperiment( df = mtab4045_tpm, save_pdf = FALSE, show_plot = TRUE, palette = "viridis", top_n = 20, scaled = FALSE, show_heatmap_title = TRUE ) ``` And now using CPM data, instead of TPM: ```{r eval=TRUE, fig.width=7, fig.height=8} mtab4045_cpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "cpm" ) mtab4045 <- heatmapAtlasExperiment( df = mtab4045_cpm, save_pdf = FALSE, show_plot = TRUE, palette = "viridis", top_n = 20, scaled = TRUE, show_heatmap_title = TRUE ) ``` ## Creating a Volcano plot using normalised data from a differential Expression Atlas experiment You can create volcano plot from downloaded normalised data for a single Expression Atlas experiment using the `volcanoDifferentialAtlasExperiment()` function: ```{r eval=TRUE, fig.width=7, fig.height=5} mtab10104_dea <- getAnalyticsDifferentialAtlasExpression( "E-MTAB-10104" ) head(mtab10104_dea) mtab10104 <- volcanoDifferentialAtlasExperiment( df = mtab10104_dea, save_pdf = FALSE, show_plot = TRUE, low_fc_colour = "Gray", high_fc_colour = "Blue", cutoff = 1, show_volcanoplot_title = TRUE ) ``` # Searching, downloading and visualising Single-Cell Expression Atlas experiments ## Search Single Expression Atlas experiments You can search for experiments in Atlas using the `searchSCAtlasExperiments()` function. ```{r} search_1 <- searchSCAtlasExperiments( query = "endoderm" ) print(search_1) search_2 <- searchSCAtlasExperiments( query = "endoderm", secondaryFilter = "human" ) print(search_2) search_3 <- searchSCAtlasExperiments( query = "endoderm", secondaryFilter = "mouse" ) print(search_3) search_4 <- searchSCAtlasExperiments( query = "arabidopsis" ) print(search_4) search_5 <- searchSCAtlasExperiments( query = "pluripotency" ) print(search_5) ``` ## Downloading a study into a SingleCellExperiment object You can download data for a Single-Cell Expression Atlas experiment using the `getAtlasSCExperiment()` function: ```{r eval=FALSE} enad19 <- getAtlasSCExperiment( "E-ENAD-19" ) # returns a SingleCellExperiment object ``` ## Visualise a dimensionality reduction plot of a SingleCellExperiment object Let's use and example for pluripotency of human early embryos and embryonic stem cells by single cell RNA-seq. ```{r} egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" ) # returns a SingleCellExperiment object print("Reduced dimension names:") print(reducedDimNames(egeod36552)) print("Column data names:") print(colnames(colData(egeod36552))) plotDimRedSCAtlasExperiment(egeod36552, dimRed = "X_pca", colorby = "time" ) plotDimRedSCAtlasExperiment(egeod36552, dimRed = "X_umap_neighbors_n_neighbors_20", colorby = "cell_type") plotDimRedSCAtlasExperiment(egeod36552, dimRed = "X_umap_neighbors_n_neighbors_20", colorby = "louvain_resolution_2.0") + theme_classic() + theme(legend.position = "left") ``` ## Plot a gene expression heatmap from a SingleCellExperiment object For the default cluster or selected clusters with Expression Atlas marker genes, you can use the `heatmapSCAtlasExperiment()` function: ```{r} egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" ) heatmapSCAtlasExperiment( egeod36552, genes=NULL, sel.K=NULL, scaleNormExp=TRUE, show_row_names=FALSE ) # heatmapSCAtlasExperiment( egeod36552, genes=NULL, sel.K=NULL, scaleNormExp=TRUE, show_row_names=FALSE ) heatmapSCAtlasExperiment( egeod36552, genes=NULL, sel.K=6, scaleNormExp=TRUE, show_row_names=FALSE ) ``` For user selected genes: ```{r} egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" ) heatmapSCAtlasExperiment( egeod36552, genes=c('ENSG00000151611','ENSG00000020577', 'ENSG00000188869' ), sel.K=NULL, scaleNormExp=FALSE, show_row_names=TRUE ) # heatmapSCAtlasExperiment( egeod36552, genes=c('ENSG00000151611','ENSG00000020577', 'ENSG00000188869' ), sel.K=NULL, scaleNormExp=TRUE, show_row_names=TRUE ) ``` ## Dot-plot for a SingleCellExperiment object For example, if we chose one marker gene from each of the clusters (k = 4) in the previous example, we can use the `dotPlotSCAtlasExperiment()` function to plot the average expression of these genes across the clusters: ```{r, fig.width=7, fig.height=4} egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" ) # dotPlotSCAtlasExperiment(egeod36552, genes=c('ENSG00000166681','ENSG00000178928', 'ENSG00000142182' , 'ENSG00000160282' ), sel.K=4) dotPlotSCAtlasExperiment(egeod36552, genes=c('ENSG00000166681','ENSG00000178928', 'ENSG00000142182' , 'ENSG00000160282' ), sel.K=4, scaleNormExp=TRUE) + theme_classic() ``` # sessionInfo ```{r} sessionInfo() ``` # References