---
title: "Data Inference"
date: "`r BiocStyle::doc_date()`"
package: sesame
output: rmarkdown::html_vignette
fig_width: 6
fig_height: 6
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{"4. Data Inference"}
  %\VignetteEncoding{UTF-8} 
---

SeSAMe implements inference of sex, age, ethnicity. These are valuable
information for checking the integrity of the experiment and detecting sample
swaps.

```{r echo=FALSE, message=FALSE}
library(sesame)
```

# Sex

Sex is inferred based on our curated X-linked probes and Y chromosome probes
excluding pseudo-autosomal regions.
```{r}
sset = sesameDataGet('EPIC.1.LNCaP')$sset
inferSex(sset)
inferSexKaryotypes(sset)
```

# Ethnicity

Ethnicity is inferred using a random forest model trained based on both the
built-in SNPs (`rs` probes) and channel-switching Type-I probes.
```{r}
inferEthnicity(sset)
```

# Age

SeSAMe provides age regression a la the Horvath 353 model.
```{r}
betas <- sesameDataGet('HM450.1.TCGA.PAAD')$betas
predictAgeHorvath353(betas)
```

# Mean intensity

The mean intensity of all the probes characterize the quantity of input DNA
and efficiency of probe hybridization.
```{r}
meanIntensity(sset)
```

# Copy Number

SeSAMe performs copy number variation in three steps: 1) normalizes the signal
intensity using a copy-number-normal data set; 2) groups adjacent probes into
bins; 3) runs DNAcopy internally to group bins into segments.
```{r, message=FALSE, fig.width=6}
ssets.normal <- sesameDataGet('EPIC.5.normal')
segs <- cnSegmentation(sset, ssets.normal)
```

To visualize segmentation in SeSAMe,
```{r, message=FALSE, fig.width=6}
visualizeSegments(segs)
```

# Cell Composition Deconvolution

SeSAMe estimates leukocyte fraction using a two-component model.This function
works for samples whose targeted cell-of-origin is not related to white blood
cells.
```{r, message=FALSE}
betas.tissue <- sesameDataGet('HM450.1.TCGA.PAAD')$betas
estimateLeukocyte(betas.tissue)
```