title: "Large-scale single-cell RNA-seq data manipulation with GDS files"
author: "Dr. Xiuwen Zheng (Genomics Research Center, AbbVie)"
date: "Dec 2020"

## Introduction

The SCArray package provides large-scale single-cell RNA-seq data manipulation using Genomic Data Structure ([GDS](http://www.bioconductor.org/packages/gdsfmt)) files. It combines dense/sparse matrices stored in GDS files and the Bioconductor infrastructure framework ([SingleCellExperiment](http://www.bioconductor.org/packages/SingleCellExperiment) and [DelayedArray](http://www.bioconductor.org/packages/DelayedArray)) to provide out-of-memory data storage and manipulation using the R programming language.

As shown in the figure, SCArray provides a `SingleCellExperiment` object for downstream data analyses. GDS is an alternative to HDF5. Unlike HDF5, GDS supports the direct storage of a sparse matrix without converting it to multiple vectors.

![**Figure 1**: Workflow of SCArray](scarray_fig.svg)


## Installation

* Requires R (>= v3.5.0), [gdsfmt](http://www.bioconductor.org/packages/gdsfmt) (>= v1.24.0)

* Bioconductor repository

To install this package, start R and enter:

```{R, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("SCArray")
```

## Format conversion

### Conversion from SingleCellExperiment

The SCArray package can convert a single-cell experiment object (SingleCellExperiment) to a GDS file using the function `scConvGDS()`. For example,