---
title: "3. Simulating Spatial Cell-Cell Interactions"
output:
    BiocStyle::html_document:
    toc: true
    toc_depth: 2
vignette: >
    %\VignetteEngine{knitr::knitr}
    %\VignetteIndexEntry{3. Simulating Spatial Cell-Cell Interactions}
    %\usepackage[UTF-8]{inputenc}
---

```{r "setup", include=FALSE}
require("knitr")
opts_chunk$set(fig.width=4, fig.height=3)

# devtools::load_all(".")
```


## Simulating Spatial Cell-Cell Interactions

scMultiSim can simulate spatial cell-cell interactions.
To do so, we need to provide the `cci` option as a list.
The following code will print more instructions on how to use the `cci` option.

```{r help-cci}
library(scMultiSim)

scmultisim_help("cci")
```

Now, we prepare a ligand-receptor interaction database.
This is pretty similar to the GRN network: it is a data frame with three columns,
specifying `target`, `regulator`, and `effect`, respectively.
The target and regulator columns should contain the IDs of the target and regulator genes.
In the following example, we have two ligand-receptor pairs interacting between two neighboring cells.

```{r cci-network}
lig_params <- data.frame(
  target    = c(101, 102),
  regulator = c(103, 104),
  effect    = c(5.2, 5.9)
)
```

We can now simulate the spatial cell-cell interactions.
In scMultiSim, the CCI network is cell-type based, which means that between each cell type pair,
we can have a different CCI network sampled from the database defined above.
Here, we set the `step.size` to 0.5, so the differentiation tree is divided into segments of length 0.5,
each segment is treated as a cell type in CCI.
We set `cell.type.interaction` to `random`, so the CCI network between each cell type pair is randomly sampled from the database.

Here, we use only 100 cells to speed up the simulation. Feel free to try a larger number of cells when running this vignette locally.

```{r}
data(GRN_params_100)
set.seed(42)

options_ <- list(
  GRN = GRN_params_100,
  speed.up = TRUE,
  num.genes = 120,
  num.cells = 80,
  num.cifs = 20,
  cif.sigma = 0.2,
  tree = Phyla3(),
  intrinsic.noise = 0.5,
  cci = list(
    params = lig_params,
    max.neighbors = 4,
    grid.size = 13,
    cell.type.interaction = "random",
    step.size = 0.5
  )
)

results <- sim_true_counts(options_)
```

The `results$cell_meta` will contain the cell type information used in CCI.
We can plot the cell spatial locations using `plot_cell_loc()`.
The arrows indicate cell-cell interactions between two cells (for the first ligand-receptor pair).

```{r plot-cell-loc, fig.width=6, fig.height=6}
plot_cell_loc(results)
```

The cell locations are available in `results$cci_locs`.

```{r print-cell-loc}
head(results$cci_locs)
```

### Speeding up the Simulation

Simulating spatial cell-cell interactions can be computationally expensive.
Setting these two options can speed up the simulation:

```
options_ <- list(
    # ...
    speed.up = T,
    cci = list(
        # ...
        start.layer = ncells
    )
)
```

First of all, it is recommended to set the experimental `speed.up = T` option. This option will become default in later versions of scMultiSim.

Next, it is possible to set the CCI option `start.layer = n_cells`, where `n_cells` is the number of cells.
scMultiSim simulates a spatial dataset by following `n_cells` steps, adding one more cell to the spatial grid in each step.
Only the final step is outputted as the result.
The CCI option `start.layer` can be used to start simulation from a specific time step.
When set to `n_cells`, the simulation will skip all previous steps by adding all cells at once.
By default, `start.layer` will be set to `n_cells` when number of cells is greater than 800.


## Spatial layouts

scMultiSim provides powerful customization options for spatial cell layouts.

### Built-in layouts

scMultiSim ships with several built-in spatial layouts.
The `enhanced` layout is the default layout, where cells are added to the grid one by one.
When adding a new cell, it has a higher probability of being placed near the existing cells of the same cell type.
```{r layout-enhanced, fig.width=6, fig.height=6}
# helper function to add `layout` to options, to make the code more readable
spatial_options <- function (...) {
  cci_opt <- list(
    params = lig_params,
    max.neighbors = 4,
    start.layer = 300,
    grid.size = 28,
    cell.type.interaction = "random"
  )
  list(
    rand.seed = 0,
    GRN = GRN_params_100,
    speed.up = TRUE,
    num.genes = 200,
    num.cells = 300,
    num.cifs = 50,
    tree = Phyla3(),
    cci = c(cci_opt, list(...))
  )
}


results <- sim_true_counts(spatial_options(
  layout = "enhanced"
))
plot_cell_loc(results, show.arrows = FALSE)
```

An option `same.type.prob` decides the probability of a new cell being placed near the existing cells of the same cell type.
By default, it is 0.8; and if we use a lower value, the new cell will be placed more randomly.
```{r layout-random, fig.width=6, fig.height=6}

results <- sim_true_counts(spatial_options(
  layout = "enhanced",
  same.type.prob = 0.1
))
plot_cell_loc(results, show.arrows = FALSE)
```

The `layers` layout arranges cells in layers.

```{r layout-layers, fig.width=6, fig.height=6}
results <- sim_true_counts(spatial_options(
  layout = "layers"
))
plot_cell_loc(results, show.arrows = FALSE)
```

The `islands` layout will put some cell types in the center like islands, and others around them.
You may specify which cell type should be islands in the format `islands:1,2,3`.
The number here can be looked up in `results$cci_cell_types`.

```{r}
results$cci_cell_types
```

```{r layout-islands, fig.width=6, fig.height=6}
results <- sim_true_counts(spatial_options(
  # cell type 4_1_2 should be the island
  layout = "islands:5"
))
plot_cell_loc(results, show.arrows = FALSE)
```

### Custom layouts

It is also possible to layout the cells programmatically.
The `layout` option can be a function that takes the cell type information and returns the spatial locations of the cells:
```
# grid_size is a number
# cell_types is an integer vector, representing the cell types
function(grids_size, cell_types) {
  # return a matrix with two columns, representing the x and y coordinates of the cells
  return matrix(nrow = 2, ncol = ncells)
}
```

For example, the following layout function will place the cells sequentially in the grid,
starting from the bottom-left corner.

```{r layout-custom, fig.width=6, fig.height=6}
results <- sim_true_counts(spatial_options(
  layout = function (grid_size, cell_types) {
    ncells <- length(cell_types)
    new_locs <- matrix(nrow = ncells, ncol = 2)
    # for each cell...
    for (i in 1:ncells) {
      # ...place it in the grid
      new_locs[i,] <- c(i %% grid_size, i %/% grid_size)
    }
    return(new_locs)
  }
))
plot_cell_loc(results, show.arrows = FALSE)
```

## Spatial domains

Next, we demonstrate how to use custom layout function to create spatial domains.
We want to have three spatial domains in a layered layout, and we have four cell types.
Each cell type has a different probability of being in each domain.

The following layout function will do this job: First of all, it generates a set of locations that form a circular shape.
Next, it assigns cells to these locations; the leftmost cell is selected as the origin.
Then, we can create a layered layout by sorting the locations based on their euclidian distance to the origin.
The three domains are determined by the distance to the origin.
We have a matrix `ct_matrix` that specifies the probability of each cell type being in each domain.
Finally, we sample the cells based on the probabilities and assign them to the domains.

```{r layout-domains}
layout_fn <- function(grid_size, final_types) {
  ncells <- length(final_types)
  grid_center <- c(round(grid_size / 2), round(grid_size / 2))
  all_locs <- gen_clutter(ncells, grid_size, grid_center)
  # center is bottom-left
  left_ones <- which(all_locs[,1] == min(all_locs[,1]))
  new_center <<- all_locs[left_ones[which.min(all_locs[left_ones, 2])],]
  dist_to_center <- sqrt(colSums((t(all_locs) - new_center)^2))
  new_locs <- all_locs[order(dist_to_center),]
  # prob of a cell type being in a zone (cell_type x zone)
  ct_matrix <- matrix(c(
    0.9, 0.1, 0.0,
    0.1, 0.8, 0.1,
    0.1, 0.7, 0.2,
    0.0, 0.1, 0.9
  ), nrow = 4, byrow = TRUE)
  # number of cells per type
  ct_pop <- c(160, 80, 100, 140)
  pop_mtx <- round(ct_matrix * ct_pop)
  if (sum(pop_mtx) != ncells) {
    diffrence <- ncells - sum(pop_mtx)
    pop_mtx[1, 1] <- pop_mtx[1, 1] + diffrence
  }
  # number of cells per zone
  zone_pop <- colSums(pop_mtx)
  # assign cells to zones
  cs <- cumsum(zone_pop)
  # sample cells
  cell_idx <- unlist(lapply(1:3, function(izone) {
    sample(rep(1:4, pop_mtx[,izone]), zone_pop[izone])
  }))
  locs <<- new_locs[order(cell_idx),]
  zone_gt <<- rep(1:3, zone_pop)[order(cell_idx)]
  return(locs)
}
```

Inspecting the result, we can see the three spatial domains, where the middle one contains a mix of two cell types.

```{r layout-domains-plot, fig.width=6, fig.height=6}
results <- sim_true_counts(list(
  num.cells = 500,
  num.genes = 300,
  num.cifs = 40,
  GRN = NA,
  speed.up = T,
  cif.sigma = 0.8,
  tree = ape::read.tree(text = "(A:1,B:1,C:1,D:1);"),
  diff.cif.fraction = 0.8,
  discrete.cif = T,
  discrete.pop.size = as.integer(c(120,150,100,130)),
  cci = list(
    params = lig_params,
    max.neighbors = 4,
    start.layer = 500,
    cell.type.interaction = "random",
    layout = layout_fn,
    step.size = 1
  )
))

plot_cell_loc(results, show.arrows = FALSE)
```

## Spatially variable genes

The `ext.cif.giv` option allows us to append custom CIF and GIV entries for each cell and gene.
We can use this option to simulate spatially variable genes.
This option should be a function that takes the kinetic parameter index and returns a list of extra CIF and GIV matrices.

```{r}
scmultisim_help("ext.cif.giv")
```

Using the previous layout function, we can add extra CIF with value based on the distance to the origin.

```{r}
ext_cif <- function(i) {
  # We manually set genes 290-300 to be spatially variable
  spatial_genes <- 290:300
  dist_to_center <- colSums((t(locs) - new_center)^2)
  dist_to_center <- dist_to_center / max(dist_to_center)
  # 3 is the s parameter
  if (i == 3) {
    # n_extra_cif x n_cells
    ex_cif <- cbind(
      # the two CIFs have large values when distance to the center is near 0.5
      rnorm(500, 0.5 * dnorm(abs(dist_to_center - 0.5), 0, 0.04), 0.02),
      rnorm(500, 0.5 * dnorm(abs(dist_to_center - 0.5), 0, 0.04), 0.02)
    )
    # n_genes x n_extra_cif
    ex_giv <- matrix(0, nrow = 300, ncol = 2)
    for (i in spatial_genes) {
      # odd genes affected by the first two CIF, even genes affected by the last two CIF
      ex_giv[i, ] <- rnorm(2, 1, 0.5)
    }
    list(ex_cif, ex_giv * 2)
  } else {
    NULL
  }
}
```

```{r}
results <- sim_true_counts(list(
  num.cells = 500,
  num.genes = 300,
  num.cifs = 40,
  GRN = NA,
  speed.up = T,
  cif.sigma = 0.8,
  tree = ape::read.tree(text = "(A:1,B:1,C:1,D:1);"),
  diff.cif.fraction = 0.8,
  ext.cif.giv = ext_cif,
  discrete.cif = T,
  discrete.pop.size = as.integer(c(120,150,100,130)),
  cci = list(
    params = lig_params,
    max.neighbors = 4,
    start.layer = 500,
    cell.type.interaction = "random",
    layout = layout_fn,
    step.size = 1
  )
))
```

Try plotting one of the spatially variable genes. We can see that the gene expression is higher in the specific spatial
region.
```{r spatially-variable-gene, fig.width=6, fig.height=6}
library(ggplot2)

plot_cell_loc(results, show.arrows = FALSE,
              .cell.pop = log(results$counts[299,] + 1)) + scale_colour_viridis_c()
```

## Long-distance Cell-Cell Interactions

scMultiSim also supports simulation of long-distance cell-cell interactions.

The CCI option `radius` controls the maximum distance between two cells for them to interact.
It can be a number or a string.
When it is a number, it specifies the maximum distance.
When it is a string it should be in the format `gaussian:sigma`, for example, `gaussian:1.2`.
In this case, the probability of two cells interacting is proportional to the distance with a Gaussian kernel applied.

By default, `radius = 1`, which means scMultiSim only consider the four nearest neighbors.

We can compare the result with different sigma values 1 and 3:

```{r long-distance-cci}

options <- lapply(c(1, 3), \(sigma) {
  list(
    rand.seed = 1,
    GRN = NA,
    num.genes = 200,
    num.cells = 500,
    num.cifs = 50,
    tree = Phyla5(),
    discrete.cif = T,
    discrete.min.pop.size = 20,
    discrete.pop.size = as.integer(c(110, 80, 140, 40, 130)),
    do.velocity = F,
    scale.s = 1,
    cci = list(
      params = lig_params,
      max.neighbors = 4,
      cell.type.interaction = "random",
      cell.type.lr.pairs = 3:6,
      step.size = 0.3,
      grid.size = 35,
      start.layer = 500,
      radius = paste0("gaussian:", sigma),
      layout = "layers"
    )
  )

})

results_1 <- sim_true_counts(options[[1]])
results_3 <- sim_true_counts(options[[2]])

```

```{r plot-long-distance-cci, fig.width=6, fig.height=6}
plot_cell_loc(results_1, show.arrows = T, .cell.pop = as.character(results$grid$final_types))
plot_cell_loc(results_3, show.arrows = T, .cell.pop = as.character(results$grid$final_types))
```

## Session Information

```{r session-info}
sessionInfo()
```