---
title: "optimalFlowData: a data package for optimalFlow"
author:
- name: Hristo Inouzhe
  affiliation: Universidad de Valladolid, Spain
  email: hristo.inouzhe@gmail.com
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: 
    BiocStyle::html_document
vignette: >
    %\VignetteIndexEntry{optimalFlow: optimal-transport approach to Flow Cytometry analysis}
    %\VignetteEngine{knitr::rmarkdown}
    \usepackage[utf8]{inputenc}
references:
- id: optimalFlow
  title: 'optimalFlow: optimal-transport approach to Flow Cytometry analysis'
  author:
  - family: del Barrio
    given: Eustasio
  - family: Inouzhe
    given: Hristo
  - family: Loubes
    given: Jean-Michel
  - family: Mayo-Iscar
    given: Agustin
  - family: Matran
    given: Carlos
  URL: 'https://arxiv.org/abs/1907.08006'
  type: article-journal
  issued:
    year: 2019
    month: 7
---



```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction

*optimalFlowData* is a package containing 40 simulated flow cytometry datasets, saved as data frames, used for testing and developping examples for the package *optimalFlow* based on the results in @optimalFlow.

The simulated cytometries are based on data that come from flow cytometry measurements obtained following the Euroflow protocols and kindly provided by Centro de Investigación del Cancer (CIC) in Salamanca, Spain. The artificial cytometries mimic 31 cytometries from healthy individuals and 9 cytometries from patients with different types of cancer.

# Installation

Installation procedure:

```{r ej0, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("optimalFlowData")
```

# Use
```{r ej01, eval = TRUE}
library(optimalFlowData)
head(Cytometry1)
```
We can create a database of gated cytometries containing. For simplicity and visualisation we only choose 4 cell types. For an example of a database, we select some of the cytometries, as is usual in machine learning, where a subset of the data is the learning set.
```{r ej1, eval = TRUE}
database <- buildDatabase(
  dataset_names = paste0('Cytometry', c(2:5, 7:9, 12:17, 19, 21)),
    population_ids = c('Monocytes', 'CD4+CD8-', 'Mature SIg Kappa', 'TCRgd-'))
```
A plot of the data in a 3 dimensional subspace
```{r ej2, echo = TRUE}
pairs(database[[1]][,c(4, 3, 9)], col = droplevels(database[[1]][, 11]))
```
The diagnosis for each cytometry is obtained as follows
```{r ej3, echo = TRUE}
help("cytometry.diagnosis") # for an explanation of the abbreviations
cytometry.diagnosis
```
# References