--- title: "Using extract_transcripts in drawProteins" author: - name: "Dr Paul Brennan" affiliation: - "Centre for Medical Education, School of Medicine, Cardiff University, Cardiff, Wales, United Kingdom" email: BrennanP@cardiff.ac.uk package: drawProteins date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Using extract_transcripts in drawProteins} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r load_packages, eval = TRUE, echo=FALSE} library(BiocStyle) library(drawProteins) library(ggplot2) library(knitr) opts_chunk$set(comment=NA, fig.align = "center", out.width = "100%", dpi = 100) ``` # Introducing extract_transcripts() in drawProteins Many proteins are present as alternate transcripts where the same gene is produces alternative forms of the protein through differential mRNA splicing or post-translational cleavage. These are detailed in UniProt. When they are extracted by the UniProt API, it gives lists of alternative forms followed by lists of features. In order to plot each protein and the appropriate features, these need to be separated in our dataframe. This is done using the `extract_transcripts()` function. This Vignette shows how this works and gives an example. The workflow using extract_transcripts() is: 1. to provide one or more Uniprot IDs 2. get a list of features from the Uniprot API 3. run `extract_transcripts()` to generate a new dataframe 4. draw the chains and features as desired Steps 1 and 2 are illustrated in drawProteins Vignette so only step3 and the visualisation of step 4 will be shown here. # Making a new dataframe with each transcript separated The NFkappaB transcription factor family contains two proteins that are present in two forms. The dataframe obtained from Uniprot is contained in the drawProtein package as "five_rel_data" and can be loaded using the `data()` function. When loaded this has 320 obs of 9 variables and will plot five chains as shown by checking the `max(five_rel_data$order)` function. To plot all the transcripts, a new dataframe is produced using the `extact_transcripts()` function. The new dataframe is called prot_data and has 430 obs of 9 variables and will plot seven chains as shown by checking the `max(prot_data$order)` function. ```{r load_NFkappaB_data, fig.height=10, fig.wide = TRUE} # load up data for five NF-kappaB proteins data("five_rel_data") max(five_rel_data$order) # returns 5 # use extract_transcripts() to create a new data frame prot_data <- extract_transcripts(five_rel_data) max(prot_data$order) # returns 7 ``` Now, let's check out the chains for the two objects for comparison purposes. ```{r check_chains, fig.height=10, fig.wide = TRUE} p1 <- draw_canvas(five_rel_data) p1 <- draw_chains(p1, five_rel_data) p1 <- p1 + ggtitle("Five chains plotted") p2 <- draw_canvas(prot_data) p2 <- draw_chains(p2, prot_data) p2 <- p2 + ggtitle("Seven chains plotted") p1 p2 ``` The appropriate domains and phosphorylation sites can be drawn correctly. ```{r draw_domains_and_phospho, fig.height=10, fig.wide = TRUE} p2 <- draw_domains(p2, prot_data) p2 <- draw_phospho(p2, prot_data, size =8) p2 ``` Note that the names of the different transcripts are the same so it's wise to use the option customize the labels. ```{r draw_canvas_and_chains, fig.height=8, fig.wide = TRUE} p2 <- draw_canvas(prot_data) p2 <- draw_chains(p2, prot_data, fill = "lightsteelblue1", outline = "grey", labels = c("p105", "p105", "p100", "p100", "Rel B", "c-Rel", "p65/Rel A", "p50", "p52"), label_size = 5) p2 <- draw_phospho(p2, prot_data, size = 8, fill = "red") p2 + theme_bw() ``` # Session info Here is the output of `sessionInfo()` on the system on which this document was compiled: ```{r session_Info, echo=FALSE} sessionInfo() ```