--- title: Plotting Alignments output: BiocStyle::html_document --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` ```{r, echo=FALSE, results="hide"} # Ensure that any errors cause the Vignette build to fail. library(knitr) opts_chunk$set(error=FALSE) ``` ```{r, echo = FALSE} apiKey <- Sys.getenv("GOOGLE_API_KEY") if (nchar(apiKey) == 0) { warning(paste("To build this vignette, please setup the environment variable", "GOOGLE_API_KEY with the public API key from your Google", "Developer Console before loading the GoogleGenomics package,", "or run GoogleGenomics::authenticate.")) knitr::knit_exit() } ``` ## Working with Reads [Google Genomics](http://cloud.google.com/genomics) implements the [GA4GH](http://ga4gh.org/) reads API and this R package can retrieve data from that implementation. For more detail, see https://cloud.google.com/genomics/v1beta2/reference/reads ```{r message=FALSE} library(GoogleGenomics) # This vignette is authenticated on package load from the env variable GOOGLE_API_KEY. # When running interactively, call the authenticate method. # ?authenticate ``` By default, this function retrieves reads for a small genomic region for one sample in [1,000 Genomes](http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/1000_genomes.html). ```{r} reads <- getReads() length(reads) ``` We can see that `r length(reads)` individual reads were returned and that the JSON response was deserialized into an R list object: ```{r} class(reads) mode(reads) ``` The top level field names are: ```{r} names(reads[[1]]) ``` And examining the alignment we see: ```{r} reads[[1]]$alignedSequence reads[[1]]$alignment$position$referenceName reads[[1]]$alignment$position$position ``` This is good, but this data becomes **much** more useful when it is converted to Bioconductor data types. For example, we can convert reads in this list representation to `r Biocpkg("GAlignments")`: ```{r} readsToGAlignments(reads) ``` ## Plotting Alignments Let's take a look at the reads that overlap [rs9536314](http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=9536314) for sample NA12893 within the [Illumina Platinum Genomes](http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/platinum_genomes.html) dataset. This SNP resides on chromosome 13 at position 33628137 in 0-based coordinates. ```{r} # Change the values of 'chromosome', 'start', or 'end' here if you wish to plot # alignments from a different portion of the genome. alignments <- getReads(readGroupSetId="CMvnhpKTFhDyy__v0qfPpkw", chromosome="chr13", start=33628130, end=33628145, converter=readsToGAlignments) alignments ``` Notice that we passed the converter to the getReads method so that we're immediately working with GAlignments which means that we can start taking advantage of other Bioconductor functionality. Also keep in mind that the parameters `start` and `end` are expressed in 0-based coordinates per the GA4GH specification but the Bioconductor data type converters in `r Biocpkg("GoogleGenomics")`, by default, transform the returned data to 1-based coordinates. ```{r message=FALSE} library(ggbio) ``` We can display the basic alignments and coverage data: ```{r coverage} alignmentPlot <- autoplot(alignments, aes(color=strand, fill=strand)) coveragePlot <- ggplot(as(alignments, "GRanges")) + stat_coverage(color="gray40", fill="skyblue") tracks(alignmentPlot, coveragePlot, xlab="Reads overlapping rs9536314 for NA12893") ``` And also display the ideogram for the corresponding location on the chromosome: ```{r ideogram} ideogramPlot <- plotIdeogram(genome="hg19", subchr="chr13") ideogramPlot + xlim(as(alignments, "GRanges")) ``` ## Provenance ```{r} sessionInfo() ```