--- title: "ExpressionAtlas package vignette" author: "Maria Keays" date: "`r Sys.Date()`" output: html_document vignette: > %\VignetteEngine{ knitr::rmarkdown } %\VignetteIndexEntry{ ExpressionAtlas } --- # Expression Atlas The [EMBL-EBI](http://www.ebi.ac.uk) [Expression Atlas](http://www.ebi.ac.uk/gxa) consists of hand-picked high quality datasets from [ArrayExpress](http://www.ebi.ac.uk/arrayexpress) that have been manually curated and re-analyzed via the Expression Atlas analysis pipeline. The Expression Atlas website allows users to search these datasets for genes and/or experimental conditions, to discover which genes are expressed in which tissues, cell types, developmental stages, and hundreds of other experimental conditions. The *ExpressionAtlas* R package allows you to search for and download pre-packaged data from Expression Atlas inside an R session. Raw counts are provided for RNA-seq datasets, while normalized intensities are available for microarray experiments. Protocols describing how the data was generated are contained within the downloaded R objects, with more detailed information available on the [Expression Atlas website](http://www.ebi.ac.uk/gxa). Sample annotations are also included in the R object. # Searching and downloading Expression Atlas data ## Searching You can search for experiments in Atlas using the `searchAtlasExperiments()` function. This function returns a *DataFrame* (see [S4Vectors](http://bioconductor.org/packages/release/bioc/html/S4Vectors.html)) containing the results of your search. The first argument to `searchAtlasExperiments()` should be a character vector of sample properties, e.g. biological sample attributes and/or experimental treatments. You may also optionally provide a species to limit your search to, as a second argument. ```{r} suppressMessages( library( ExpressionAtlas ) ) ``` ```{r eval=FALSE} atlasRes <- searchAtlasExperiments( properties = "salt", species = "rice" ) # Searching for Expression Atlas experiments matching your query ... # Query successful. # Found 3 experiments matching your query. ``` ```{r, echo=FALSE} data( "atlasRes" ) ``` ```{r} atlasRes ``` The *Accession* column contains the ArrayExpress accession of each dataset -- the unique identifier assigned to it. The species, experiment type (e.g. microarray or RNA-seq), and title of each dataset are also listed. ## Downloading the data To download the data for any/all of the experiments in your results, you can use the function `getAtlasData()`. This function accepts a vector of ArrayExpress accessions. The data is downloaded into a *SimpleList* object (see package [S4Vectors](http://bioconductor.org/packages/release/bioc/html/S4Vectors.html)), with one entry per experiment, listed by accession. For example, to download all the datasets in your results: ```{r eval=FALSE} allExps <- getAtlasData( atlasRes$Accession ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-GEOD-11175/E-GEOD-11175-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-GEOD-11175 # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1625 # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1624/E-MTAB-1624-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1624 ``` ```{r, echo=FALSE} data( "allExps" ) ``` ```{r} allExps ``` To only download the RNA-seq experiment(s): ```{r eval=FALSE} rnaseqExps <- getAtlasData( atlasRes$Accession[ grep( "rna-seq", atlasRes$Type, ignore.case = TRUE ) ] ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1625 ``` ```{r, echo=FALSE} data( "rnaseqExps" ) ``` ```{r} rnaseqExps ``` To access an experiment summary, use the accession: ```{r} mtab1624 <- allExps[[ "E-MTAB-1624" ]] mtab1625 <- allExps[[ "E-MTAB-1625" ]] ``` Each dataset is also represented by a *SimpleList*, with one entry per platform used in the experiment. For RNA-seq data there will only ever be one entry, named `rnaseq`. For microarray data, there is one entry per array design used, listed by ArrayExpress array design accession (see below). ### RNA-seq experiment summaries Following on from above, `mtab1625` now contains a *SimpleList* object with a single entry named `rnaseq`. For RNA-seq experiments, this entry is a *RangedSummarizedExperiment* object (see package [SummarizedExperiment](http://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html)). ```{r} sumexp <- mtab1625$rnaseq sumexp ``` The matrix of raw counts for this experiment is stored in the *assays* slot: ```{r} head( assays( sumexp )$counts ) ``` The sample annotations can be found in the *colData* slot: ```{r} colData( sumexp ) ``` Information describing how the raw data files were processed to obtain the raw counts matrix are found in the *metadata* slot: ```{r} metadata( sumexp ) ``` ### Single-channel microarray experiments Data from a single-channel microarray experiment, e.g. [E-MTAB-1624](http://www.ebi.ac.uk/gxa/experiments/E-MTAB-1624), is represented as one or more *[ExpressionSet](https://www.bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf)* object(s) in the SimpleList that is downloaded. *ExpressionSet* objects are indexed by the ArrayExpress accession(s) of the microarray design(s) used in the original experiment. ```{r} names( mtab1624 ) affy126data <- mtab1624[[ "A-AFFY-126" ]] affy126data ``` The matrix of normalized intensity values is in the *assayData* slot: ```{r} head( exprs( affy126data ) ) ``` The sample annotations are in the *phenoData* slot: ```{r} pData( affy126data ) ``` A brief outline of how the raw data was normalized is in the *experimentData* slot: ```{r} preproc( experimentData( affy126data ) ) ``` # Downloading a single Expression Atlas experiment summary You can also download data for a single Expression Atlas experiment using the `getAtlasExperiment()` function: ```{r eval=FALSE} mtab3007 <- getAtlasExperiment( "E-MTAB-3007" ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-3007/E-MTAB-3007-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-3007 ```