drawProteins 1.14.0
Many proteins are present as alternate transcripts where the same gene is produces alternative forms of the protein through differential mRNA splicing or post-translational cleavage.
These are detailed in UniProt. When they are extracted by the UniProt API, it
gives lists of alternative forms followed by lists of features. In order to
plot each protein and the appropriate features, these need to be separated in
our dataframe. This is done using the extract_transcripts()
function.
This Vignette shows how this works and gives an example.
The workflow using extract_transcripts() is:
extract_transcripts()
to generate a new dataframeSteps 1 and 2 are illustrated in drawProteins Vignette so only step3 and the visualisation of step 4 will be shown here.
The NFkappaB transcription factor family contains two proteins that are present
in two forms. The dataframe obtained from Uniprot is contained in the
drawProtein package as “five_rel_data” and can be loaded using the data()
function.
When loaded this has 320 obs of 9 variables and will plot five chains as
shown by checking the max(five_rel_data$order)
function.
To plot all the transcripts, a new dataframe is produced using the
extact_transcripts()
function. The new dataframe is called prot_data and
has 430 obs of 9 variables and will plot seven chains as shown by checking
the max(prot_data$order)
function.
# load up data for five NF-kappaB proteins
data("five_rel_data")
max(five_rel_data$order)
[1] 5
# returns 5
# use extract_transcripts() to create a new data frame
prot_data <- extract_transcripts(five_rel_data)
max(prot_data$order)
[1] 7
# returns 7
Now, let’s check out the chains for the two objects for comparison purposes.
p1 <- draw_canvas(five_rel_data)
p1 <- draw_chains(p1, five_rel_data)
p1 <- p1 + ggtitle("Five chains plotted")
p2 <- draw_canvas(prot_data)
p2 <- draw_chains(p2, prot_data)
p2 <- p2 + ggtitle("Seven chains plotted")
p1
p2
The appropriate domains and phosphorylation sites can be drawn correctly.
p2 <- draw_domains(p2, prot_data)
p2 <- draw_phospho(p2, prot_data, size =8)
p2
Note that the names of the different transcripts are the same so it’s wise to use the option customize the labels.
p2 <- draw_canvas(prot_data)
p2 <- draw_chains(p2, prot_data,
fill = "lightsteelblue1",
outline = "grey",
labels = c("p105",
"p105",
"p100",
"p100",
"Rel B",
"c-Rel",
"p65/Rel A",
"p50",
"p52"),
label_size = 5)
p2 <- draw_phospho(p2, prot_data, size = 8, fill = "red")
p2 + theme_bw()
Here is the output of sessionInfo()
on the system on which this document was
compiled:
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.36 ggplot2_3.3.5 httr_1.4.2
[4] drawProteins_1.14.0 BiocStyle_2.22.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 highr_0.9 bslib_0.3.1
[4] compiler_4.1.1 pillar_1.6.4 BiocManager_1.30.16
[7] jquerylib_0.1.4 tools_4.1.1 digest_0.6.28
[10] jsonlite_1.7.2 evaluate_0.14 lifecycle_1.0.1
[13] tibble_3.1.5 gtable_0.3.0 pkgconfig_2.0.3
[16] rlang_0.4.12 DBI_1.1.1 magick_2.7.3
[19] curl_4.3.2 yaml_2.2.1 xfun_0.27
[22] fastmap_1.1.0 withr_2.4.2 stringr_1.4.0
[25] dplyr_1.0.7 generics_0.1.1 sass_0.4.0
[28] vctrs_0.3.8 tidyselect_1.1.1 grid_4.1.1
[31] glue_1.4.2 R6_2.5.1 fansi_0.5.0
[34] rmarkdown_2.11 bookdown_0.24 farver_2.1.0
[37] purrr_0.3.4 magrittr_2.0.1 scales_1.1.1
[40] htmltools_0.5.2 ellipsis_0.3.2 assertthat_0.2.1
[43] colorspace_2.0-2 labeling_0.4.2 utf8_1.2.2
[46] stringi_1.7.5 munsell_0.5.0 crayon_1.4.1