--- title: "SPLINTER" subtitle: "SPLice INTERpreter: Alternative splicing analysis toolkit" author: "Diana Low" date: "`r doc_date()`" package: "`r pkg_ver('SPLINTER')`" abstract: vignette: > %\VignetteIndexEntry{SPLINTER} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::pdf_document --- \pagebreak # Introduction SPLINTER provides tools to analyze alternative splicing sites, interpret outcomes based on sequence information, select and design primers for site validiation and give visual representation of the event to guide downstream experiments. # Loading the package To load the `r Biocpkg("SPLINTER")` package: ```{r} library(SPLINTER) ``` # Initializing the genome for transcript selection In this example, we will be utilizing the mm9 genome for mouse. You will need to install the appropriate package (eg. `r Biocpkg("BSgenome.Mmusculus.UCSC.mm9")`) for the genome that you will be using. ```{r, message=FALSE} library(BSgenome.Mmusculus.UCSC.mm9) library(GenomicFeatures) bsgenome <- BSgenome.Mmusculus.UCSC.mm9 ``` We begin with full set of available transcripts to screen from, and read it into a \Rclass{TxDb} object. One source of this (best option to ensure compatibility) would be the GTF file that you have used for alternative splicing analysis. For other sources of data, please refer to `r Biocpkg("GenomicFeatures")`). We then extract the coding sequences (CDS), and transcripts in general (coding and non-coding) (exons) from this object. ```{r,message=FALSE} data_path<-system.file("extdata",package="SPLINTER") gtf_file<-paste(data_path,"/Mus_musculus.Ensembl.NCBIM37.65.partial.gtf",sep="") txdb <- makeTxDbFromGFF(file=gtf_file,chrominfo = Seqinfo(genome="mm9")) # txdb generation can take quite long, you can save the object and load it the next time # saveDb(txdb,file="txdb_hg19.sqlite") # txdb<-loadDb(file="txdb_hg19.sqlite") # extract CDS and exon information from TxDb object thecds<-cdsBy(txdb,by="tx",use.names=TRUE) theexons<-exonsBy(txdb,by="tx",use.names=TRUE) ``` # Reading in the splicing analysis file The output file from [MATS](http://rnaseq-mats.sourceforge.net/) is used here, but essentially all that is needed are coordinates of the exons (target and flanking) involved in the splicing processt to be studied. For the case of exon skipping, this will include the upstream, target and downstream exons. More output types will be supported in the future. \pagebreak The following types of alternative splicing events are accepted: Type of alternative splicing event | Definition ------------- | ------------- SE | Skipped exon RI | Retained intron MXE | Mutually exclusive exon A5SS | Alternative 5' splice site A3SS | Alternative 3' splice site ```{r} typeofAS="SE" mats_file<-paste(data_path,"/skipped_exons.txt",sep="") splice_data <-extractSpliceEvents(data=mats_file, filetype='mats', splicetype=typeofAS) splice_data$data[,c(1:10)] ``` ## Additional annotation `r Biocpkg("SPLINTER")` assumes that the main identifier is ENSEMBL, however gene symbols can be added. ```{r} splice_data<-addEnsemblAnnotation(data=splice_data,species="mmusculus") # (Optional) Sorting the dataframe, if you have supporting statistical information splice_data$data<-splice_data$data[with(splice_data$data,order(FDR,-IncLevelDifference)),] head(splice_data$data[,c(1:10)]) ``` \pagebreak # Analyzing a specific gene ## Inspecting a single gene in more detail (single record) Once we have defined the events, we will pick 1 event to analyze. ```{r} single_id='Prmt5' pp<-which(grepl(single_id,splice_data$data$Symbol)) # Prmt5 has 1 record splice_data$data[pp,c(1:6)] # show all records single_record<-splice_data$data[pp[1],] ``` ## Finding relevant transcripts from the ENSEMBL database To reduce search complexity, we define the valid transcripts and coding sequences with regards to our gene of interest. We find that Prmt5 has 7 transcripts, 2 of which are coding sequences. ```{r} valid_tx <- findTX(id=single_record$ID,tx=theexons,db=txdb) valid_cds<- findTX(id=single_record$ID,tx=thecds,db=txdb) ``` ## Constructing the region of interest (ROI) The `makeROI` function will create a list containing `GRanges` objects for the splicing event. This will help identify and construct relevant outputs later. This list contains the following information: - type: type of alternative splicing event - name: name of gene - roi: GRanges object of the exon - flank: GRanges object of the flanking exons - roi_range: GRanges list containing * GRanges object of Type 1 * GRanges object of Type 2 \pagebreak Type of alternative splicing | Type 1 representation | Type 2 representation (annotated only) ------------------- | ------------------------ | ------------------------- SE | isoform with event exon included | isoform with the exon skipped RI | isoform with normal exon boundaries | isoform with the intron retained MXE | isoform defined 1st (leftmost) in input | isoform defined 2nd in input A5SS | isoform with longer exon | isoform with shorter exon A3SS | isoform with longer exon | isoform with shorter exon ```{r} roi <- makeROI(single_record,type="SE") roi ``` ## Finding transcripts that contain the ROI At this juncture, we look for transcripts are compatible with the ROI. Compatibility is defined as having the exact cassette (matching upstream, target, downstream) exons. In the case of intron retention, this would just be the 2 exons flanking the intron. We notice here that Prmt5 only has 1 compatible transcript involved in the event ROI, out of 7 transcripts (or 2 coding transcripts). There are no Type 2 transcripts, which means there are no annotated transcripts of Prmt5 containing the alternative event. ```{r} compatible_tx<- findCompatibleEvents(valid_tx,roi=roi,verbose=TRUE) compatible_cds<- findCompatibleEvents(valid_cds,valid_tx,roi=roi,verbose=TRUE) ``` \pagebreak # Simulating alternatively spliced products ## Simulating the outcome of exon skipping by removing an exonic region ```{r} region_minus_exon <-removeRegion(compatible_cds$hits[[1]],roi) ``` ## Simulating the outcome of intron retention by inserting an intronic region ```{r} # Not relevant for this Prmt5 skipped exon example region_plus_exon <-insertRegion(region_minus_exon,roi) ``` ## Comparing sequences before and after removal/insertion of a region ```{r} event<-eventOutcomeCompare(seq1=compatible_cds$hits[[1]],seq2=region_minus_exon, genome=bsgenome,direction=TRUE,fullseq=FALSE) event ``` \pagebreak # Designing primers to inspect splicing regions ## Getting the DNA of the region of interest This function will return the DNA of the ROI, with exons separated by "[]" (Primer3 notation) and the junction marked by `jstart`. ```{r} aa<-getRegionDNA(roi,bsgenome) aa ``` ## Using Primer3 to design primers for alternative splicing identification We have included a helper function to run Primer3 from within `R`. You will need to define the path to your Primer3 installation. Refer to `?callPrimer3` for more details. ```{r, eval=FALSE} primers<-callPrimer3(seq=aa$seq,sequence_target = aa$jstart,size_range='100-500') ``` ```{r} primers[,c(1:4)] ``` Alternatively, primers can be entered manually with the appropriate headers. ```{r} primers <- data.frame(PRIMER_LEFT_SEQUENCE="ACTTTCGGACTCTGTGTGACT", PRIMER_RIGHT_SEQUENCE="TCATAGGCATTGGGTGGAGG", stringsAsFactors=FALSE) ``` ## Checking primers coverage As a confirmation, we can run the primers against the ROI to give the genomic location of the primer coverage. ```{r} cp<-checkPrimer(primers[1,],bsgenome,roi) cp ``` ## Predicting PCR results using the primers `getPCRsizes` will give you the length of the PCR product produced by the set of primers. ```{r} pcr_result1<-getPCRsizes(cp,theexons) pcr_result1 tx_minus_exon <-removeRegion(compatible_tx$hits[[1]],roi) pcr_result2<-getPCRsizes(cp,tx_minus_exon) pcr_result2 ``` ### Selecting sizes relevant to splicing event (subset of getPCRsizes) While `getPCRsizes` will return all possible PCR products for a given set of annotation, `splitPCRhit` will return PCR product sizes that are relevant to the splicing event in question. ```{r} relevant_pcr_bands<-splitPCRhit(pcr_result1,compatible_tx) relevant_pcr_bands ``` # Plotting results ```{r} # reading in BAM files mt<-paste(data_path,"/mt_chr14.bam",sep="") wt<-paste(data_path,"/wt_chr14.bam",sep="") # Plotting genomic range, read density and splice changes eventPlot(transcripts=theexons,roi_plot=roi,bams=c(wt,mt),names=c('wt','mt'), annoLabel=single_id,rspan=2000) # Barplot of PSI values if provided psiPlot(single_record) ``` # Session info ```{r sessionInfo, echo=FALSE} sessionInfo() ```