---
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{The EPICv2manifest package user's guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# The EPICv2manifest package user's guide
### Peters TJ, Pidsley R
## Package contents
EPICv2manifest is an annotation package providing a data.frame object containing the complete 49 column Infinium Illumina EPIC v2.0 probe manifest ("IlmnID" constitutes the rownames), plus 31 additional columns derived from Peters et al. (2024):
- `CpG_chrm`, `CpG_beg`, `CpG_end` - EPICv2 coordinates from Sesame manifest
- `MismatchPos` - vector of probes where "Y" indicates discrepant genomic position between Sesame and Illumina, includes those that are missing mapping information in Illumina manifest (chr0)
- `MissingPos` - vector of probes where "Y" indicates probes missing mapping information in Illumina manifest
- `namerep` - vector of probes where "Y" indicates probes that have replicates based on probe name match (this vector can be married up with `Name` column to select a single probe to represent each name replicate)
- `seqrep` - vector of probes where "Y" indicates probes that have exact sequence matches with other probes within EPICv2
- `seqrep_IlmnIDs` - IlmnIDs of those probes with exact sequence matches with other probes within EPICv2 (corresponding to `seqrep`)
- `seqrep_RepNum` - replicate number of those probes with exact sequence matches with other probes within EPICv2 (corresponding to `seqrep` and `seqrep_IlmnIDs`). Can be used to e.g. filter for only 1 probe per sequence-replicate.
- `posrep` - vector of probes where "Y" indicates probes that have exact genomic position matches with other probes within EPICv2 (based on Illumina genomic positions – note, none of the affected probes are discrepant with Sesame mapping)
- `posrep_IlmnIDs` - IlmnIDs of those probes with exact genomic position matches with other probes within EPICv2 (corresponding to `posrep`)
- `posrep_RepNum` - replicate number of those probes with exact genomic position matches with other probes within EPICv2 (corresponding to `posrep` and `posrep_IlmnIDs`). Can be used to e.g. filter for only 1 probe per genomic-position-replicate.
- `EPICv1probeID` - vector of EPICv1 probe names where probe names match between EPICv1 and EPICv2
- `EPICv1seqmatch` - vector of EPICv1 probe names where probe sequences match between EPICv1 and EPICv2
- `EPICv1locmatch` - vector of EPICv1 probe names where genomic locations match between EPICv1 and EPICv2 (based on Sesame locations)
- `K450probeID` - vector of 450K probe names where probe names match between 450K and EPICv2
- `K450seqmatch` - vector of 450K probe names where probe sequences match between 450K and EPICv2
- `K450locmatch` - vector of 450K probe names where genomic locations match between 450K and EPICv2 (based on Sesame locations)
- `K450locmatch2` - vector of 450K probe names where additional 450K probes have a genomic location match between 450K and EPICv2 (based on Sesame locations)
- `K27probeID` - vector of 27K probe names where probe names match between 27K and EPICv2
- `K27seqmatch` - vector of 27K probe names where probe sequences match between 27K and EPICv2
- `K27locmatch` - vector of 27K probe names where genomic locations match between 27K and EPICv2 (based on Sesame locations)
- `K27locmatch2` - vector of 27K probe names where additional 27K probes have a genomic location match between 27K and EPICv2 (based on Sesame locations)
- `CH_BLAT` - vector of probes where "Y" indicates at least one *in silico* cross-hybridisation event ($\ge 47$ bp match) to a non-target region of the genome, predicted by BLAT (Kent 2002)
- `CH_WGBS_evidence` – subset vector of `CH_BLAT` where "Y" indicates a greater affinity for the off-target(s), via comparison to whole genome bisulphite sequencing on matched samples
- `RMSE_with_WGBS` – root mean squared error when comparing probe methylation to matched target CpG site methylation from WGBS *M*-values)
- `Num_offtargets` – number of off-target *in silico* hybridisation events predicted by the probe sequence via BLAT (Kent 2002)
- `Suggested_offtarget` - if `CH_WGBS_evidence == "Y"`, the hg38 coordinate of the off-target cytosine conferring minimum RMSE with WGBS
- `Rep_results_by_NAME` – results of competitive comparison between replicates, with replicate probe sets defined by column `Name`
- `Rep_results_by_SEQUENCE` - results of competitive comparison between replicates, with replicate probe sets defined by column `seqrep_IlmnIDs`
- `Rep_results_by_LOCATION` - results of competitive comparison between replicates, with replicate probe sets defined by column `posrep_IlmnIDs`
To access the EPICv2manifest object, please run the following:
```{getmanifest}
library(AnnotationHub)
ah <- AnnotationHub()
EPICv2manifest <- ah[["AH116484"]]
```
```{sessioninfo}
sessionInfo()
```
## References
Peters, T.J., Meyer, B., Ryan, L. *et al.* (2024). Characterisation and reproducibility of the HumanMethylationEPIC v2.0 BeadChip for DNA methylation profiling. *BMC Genomics* **25**, 251. https://doi.org/10.1186/s12864-024-10027-5
Kent, W. J. (2002). BLAT--the BLAST-like alignment tool. *Genome Research*, **12**(4), 656–664.