---
title: "C. ClinVar Integration"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{C. ClinVar Integration}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

Original version: 1 May, 2024

```{r setup, message = FALSE}
library(AlphaMissenseR)
```

# Introduction

[ClinVar][cv_link] is a freely available, public archive of human
genetic variants that provides clinical classifications for whether a
variant is likely benign or pathogenic. The AlphaMissense
[publication][Science] uses the ClinVar data to evaluate and calibrate
the predictions generated by their model. A table containing ClinVar
information for 82872 variants across 7951 proteins was derived from
the supplemental data of the AlphaMissense paper, and is made
available through this package for benchmarking and visualization
purposes.

[cv_link]: https://www.ncbi.nlm.nih.gov/clinvar/
[Science]: https://www.science.org/doi/epdf/10.1126/science.adg7492


# Access ClinVar classifications with AlphaMissense predictions

The ClinVar table can be accessed using `clinvar_data()` from the database.

```{r download_cv}
clinvar_data()
```

The ClinVar table is now available for exploration or parsing.

# Compare ClinVar and AlphaMissense

This section uses the `clinvar_plot()` function to generate a
scatterplot for benchmarking and comparing ClinVar classification with
AlphaMissense predictions. By default, the function takes one UniProt
accession identifier, derives AlphaMissense scores from
`am_data("aa_substitution")`, and pulls ClinVar classifications from
the data.frame previously obtained. Alternatively, it is possible to
pass a custom AlphaMissense or ClinVar table to the function.  See
function details for more information.

```{r plot_P37023, fig.width = 7}
clinvar_plot(uniprotId = "P37023")
```

We returned a `ggplot` object which overlays ClinVar classifications
onto AlphaMissense predicted scores. Blue, gray, and red colors
represent pathogenicity classifications for "likely benign",
"ambiguous", or "likely pathogenic", respectively. Large, bolded
points are ClinVar variants colored according to their clinical
classification, while smaller points in the background are
AlphaMissense predictions.

We can note a discrepancy between the clinically-validated annotations
and the AlphaMissense predictions around position 50. AlphaMissense
seems to predict several variants in that region as likely benign,
while ClinVar identifies them as pathogenic.

Because the ClinVar dataset is not exhaustive (not all proteins have
been clinically-assessed), there may be proteins where information is
not available.  In this case, the function will provide an error.

Remember to disconnect from the database.
```{r, close_db}
db_disconnect_all()
```

# Session information {.unnumbered}

```{r}
sessionInfo()
```