---
title: "1. Concept of DelayedTensor"
author:
- name: Koki Tsuyuzaki
affiliation: Laboratory for Bioinformatics Research,
RIKEN Center for Biosystems Dynamics Research
- name: Itoshi Nikaido
affiliation: Laboratory for Bioinformatics Research,
RIKEN Center for Biosystems Dynamics Research
email: k.t.the-answer@hotmail.co.jp
graphics: no
package: DelayedTensor
output:
BiocStyle::html_document:
toc_float: true
vignette: |
%\VignetteIndexEntry{DelayedTensor}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Authors**: `r packageDescription("DelayedTensor")[["Author"]] `
**Last modified:** `r file.info("DelayedTensor_1.Rmd")$mtime`
**Compiled**: `r date()`
# Introduction
## Heterogenous Biological Data
Biological systems have very complicated structures like this
figure^[https://f1000research.com/slides/9-1260].
![Figure 1: Heterogenous Biological Data](Figure1_1.png)
For example, in the cell, DNA sequences are folded in cellular nucleus,
RNA molecules are transcripted from the DNA,
proteins are translated from the RNAs,
and finally, the proteins are related to cellular functions.
Outside of the cell, there are also many signals like a bacterial infection,
adding chemical reagents, drugs, lifestyle, and so on.
The change of these molecular types/phenomena finally causes the phenotype
such as disease, BMI, and morphology.
It is not possible to measure all molecular types or phenomena simultaneously,
so one or two of them are chosen and exhaustively measured.
This approach is called omics study and widely used.
For example, genomics measure all DNA sequences, and transcriptomics measure
all RNA molecules.
There is a need for a framework that can handle and analyze such
heterogeneous data structures in a unified manner and
provide biological interpretation.
Tensors are a mathematical framework
that can be very useful in such a situation.
## What is Tensor
A tensor can be considered as a generalized form of data
representation^[https://f1000research.com/slides/9-1260].
For example, a scalar value, a vector, and a matrix are also called 0th-order
tensor, 1st-order tensor, and 2nd-order tensor, respectively.
If a data has three "modes" (1. height, 2. width, and 3. depth),
it is called a 3rd-order tensor.
![Figure 2: What is Tensor](Figure1_2.png)
That's why any data is basically a tensor,
but in most cases, the term tensor implies 3rd-order or higher-order tensor.
## What is Tensor Decomposition
Tensor decomposition is the extension of matrix decomposition.
If we have a third-order tensor,
gene times tissue times condition,
using tensor decomposition, we can extract a small number of
patterns^[https://f1000research.com/slides/9-1260].
![Figure 3: What is Tensor Decomposition](Figure1_3.png)
Each vector can be summarized to the multiple matrices and
these are called factor matrices.
The scalar values are summarized to a small tensor,
and this is called core tensor.
## Concept of DelayedTensor: Block Processing-enabled Tensor Operations
A tensor is more than just a multi-dimensional array; as we will see later,
there are various operations that are specific to tensors, such as reshaping,
mode-wise statistics, and various tensor products.
These operations are essential in the analysis of tensor data and
the implementation of tensor decomposition algorithms.
Although the standard `array` of R language can express
increasing orders of tensors, it does not provide tensor-specific operations.
Therefore, many R users manipulate tensor data by using
the functions implemented in the `r CRANpkg("rTensor")` package for now.
Although `r CRANpkg("rTensor")` is very useful,
it assumes the input object to be an in-memory array.
On the other hand, tensors can easily become huge
as the order and the size of each mode increase,
and may no longer fit in memory.
`r Biocpkg("DelayedTensor")` is implmented for such extreamly huge tensor data.
`r Biocpkg("DelayedTensor")` provides some functions of `r CRANpkg("rTensor")`
by reimplementing them with `r Biocpkg("DelayedArray")`.
`r Biocpkg("DelayedArray")` is a framework that allows us to use the data
on the disk as if it were a standard array in R.
There are some out-of-core backend file system such as `r Biocpkg("HDF5Array")`
and `r Biocpkg("TileDBArray")` used in `r Biocpkg("DelayedArray")`
and the incremental calculations can be performed
by implementing the functions in support of "block processing".
The functionality of DelayedTensor is fourfold.
![Figure 4: Concept of DelayedTensor](Figure1_4.png)
1. **Block-Processing Tensor Reshaping**: Operations such as folding
and unfolding a higher-order tensor data into a matrix
can be performed while taking care of the block size.
2. **Block-Processing Tensor Arithmetic**: Calculation of sums and averages
for each mode, and operations such as Hadamard product, Kronecker product,
and Khatri-Rao product can be performed while taking care of block size.
3. **Block-Processing Tensor Decomposition**: Some of the tensor decomposition
algorithms implemented in `r CRANpkg("rTensor")` have been reimplemented using
reshaping and arithmetic functions of `r Biocpkg("DelayedTensor")`.
4. **Block-Processing einsum**: In addition to the `r CRANpkg("rTensor")`
functions, the `r CRANpkg("einsum")` function, which is well-known as
the Numpy (Python) function, has also been reimplemented based on
`r Biocpkg("DelayedArray")` and the block processing framework.
`r CRANpkg("einsum")` is a very powerful preprocessing method
to merge multiple tensor data into a single tensor.
Although what is executed inside the functions are very different,
the function name, argument name, and value name are exactly the same
as those of `r CRANpkg("rTensor")`
so that `r CRANpkg("rTensor")` users can easily introduce them.
# Session information {.unnumbered}
```{r sessionInfo, echo=FALSE}
sessionInfo()
```