--- title: "C. ClinVar Integration" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{C. ClinVar Integration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Original version: 1 May, 2024 ```{r setup, message = FALSE} library(AlphaMissenseR) ``` # Introduction [ClinVar][cv_link] is a freely available, public archive of human genetic variants that provides clinical classifications for whether a variant is likely benign or pathogenic. The AlphaMissense [publication][Science] uses the ClinVar data to evaluate and calibrate the predictions generated by their model. A table containing ClinVar information for 82872 variants across 7951 proteins was derived from the supplemental data of the AlphaMissense paper, and is made available through this package for benchmarking and visualization purposes. [cv_link]: https://www.ncbi.nlm.nih.gov/clinvar/ [Science]: https://www.science.org/doi/epdf/10.1126/science.adg7492 # Access ClinVar classifications with AlphaMissense predictions The ClinVar table can be accessed using `clinvar_data()` from the database. ```{r download_cv} clinvar_data() ``` The ClinVar table is now available for exploration or parsing. # Compare ClinVar and AlphaMissense This section uses the `clinvar_plot()` function to generate a scatterplot for benchmarking and comparing ClinVar classification with AlphaMissense predictions. By default, the function takes one UniProt accession identifier, derives AlphaMissense scores from `am_data("aa_substitution")`, and pulls ClinVar classifications from the data.frame previously obtained. Alternatively, it is possible to pass a custom AlphaMissense or ClinVar table to the function. See function details for more information. ```{r plot_P37023, fig.width = 7} clinvar_plot(uniprotId = "P37023") ``` We returned a `ggplot` object which overlays ClinVar classifications onto AlphaMissense predicted scores. Blue, gray, and red colors represent pathogenicity classifications for "likely benign", "ambiguous", or "likely pathogenic", respectively. Large, bolded points are ClinVar variants colored according to their clinical classification, while smaller points in the background are AlphaMissense predictions. We can note a discrepancy between the clinically-validated annotations and the AlphaMissense predictions around position 50. AlphaMissense seems to predict several variants in that region as likely benign, while ClinVar identifies them as pathogenic. Because the ClinVar dataset is not exhaustive (not all proteins have been clinically-assessed), there may be proteins where information is not available. In this case, the function will provide an error. Remember to disconnect from the database. ```{r, close_db} db_disconnect_all() ``` # Session information {.unnumbered} ```{r} sessionInfo() ```