--- title: "Using the GEOfastq Package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using the GEOfastq Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(GEOfastq) ``` # Installation `GEOfastq` can be installed from Bioconductor as follows: ```{r, eval = FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GEOfastq") ``` # Overview of GEOfastq The NCBI [Gene Expression Omnibus](https://www.ncbi.nlm.nih.gov/geo/) (GEO) offers a convenient interface to explore high-throughput experimental data such as RNA-seq. GEO deposits RNA-seq data as sra files to the Sequence Read Archive (SRA) which can be converted to fastq files using `fastq-dump`. This conversion process can be quite slow and it is usually more convenient to download fastq files for a GEO accession generated by the European Nucleotide Archive (ENA). `GEOfastq` crawls GEO to retrieve metadata and ENA fastq urls, and then downloads them. # Getting Started using GEOfastq To get fastq data for a GEO series, we first retrieve the metadata for a GEO accession: ```{r} gse_name <- 'GSE133758' gse_text <- crawl_gse(gse_name) ``` Next, we extract the sample accessions for this study and retrieve the GEO metadata and ENA fastq url for an example: ```{r} gsm_names <- extract_gsms(gse_text) gsm_name <- gsm_names[182] srp_meta <- crawl_gsms(gsm_name) ``` Now that we have retrieved the necessary metadata, we are ready to download the fastq files for this sample: ```{r} data_dir <- tempdir() # example using smaller file srp_meta <- data.frame( run = 'SRR014242', row.names = 'SRR014242', gsm_name = 'GSM315559', ebi_dir = get_dldir('SRR014242'), stringsAsFactors = FALSE) res <- get_fastqs(srp_meta, data_dir) ``` # Session info The following package and versions were used in the production of this vignette. ```{r echo=FALSE} sessionInfo() ```