---
title: "Introduction  to hammers"
author: "Andrei-Florian Stoica"
package: hammers
date: March 6, 2026
output: BiocStyle::html_document
vignette: >
    %\VignetteIndexEntry{Getting started with CSOA}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

`hammers` is a utilities suite for scRNA-seq data analysis. It provides simple 
tools to address tasks such as retrieving aggregate gene statistics, 
finding and removing rare genes, performing representation analysis, 
computing the center of mass for the expression of a gene of interest in 
low-dimensional space, and calculating silhouette and cluster-normalized 
silhouette.

# Installation

To install hammers, run the following commands in an R session:

```{r setup, eval=FALSE}
if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("hammers")
```

# Prerequisites
In addition to hammers, you need to install [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) 
and [scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html)
for this tutorial.

# Loading and preparing  data

This tutorial uses an scRNA-seq human pancreas dataset. After loading the 
required packages, download the dataset using the `BaronPancreasData` function 
from `scRNAseq`. The dataset will be stored as a SingleCellExperiment object. 

```{r message=FALSE, warning=FALSE, results=FALSE}
library(hammers)
library(scLang)
library(scRNAseq)
library(scater)

sceObj <- BaronPancreasData('human')
```

We will first normalize and 
log-transform the data using the `logNormCounts` function from `scuttle` 
(loaded automatically with `scater`).

```{r message=FALSE, warning=FALSE, results=FALSE}
sceObj <- logNormCounts(sceObj)
```

We will also need PCA and UMAP dimensions. These can be computed using the 
`runPCA` and `runUMAP` functions from `scater`:

```{r message=FALSE, warning=FALSE, results=FALSE}
sceObj <- runPCA(sceObj)
sceObj <- runUMAP(sceObj)
```


# Extracting gene information

The `genePresence` function extracts the `genes` that appear in at 
least `minCutoff` and at most `maxCutoff` cells. It returns a data frame 
listing the number of cells in which each gene that meets these criteria is 
expressed:

```{r}
df <- genePresence(sceObj, rownames(sceObj)[seq(100)], 300, 2000)
dim(df)
head(df)
```

The `geneCellSets` function lists all the cells in which selected genes are 
expressed:

```{r}
cellSets <- geneCellSets(sceObj, rownames(sceObj)[seq(100)])
head(cellSets[[1]])
```

`findRareGenes` is a wrapper around `genePresence` that can extract the
genes that appear in fewer than `nCells` cells:

```{r}
df <- findRareGenes(sceObj, nCells=10)
dim(df)
head(df)
```

The `removeRareGenes` function calls `findRareGenes` and subsequently removes
the identified rare genes from the object:

```{r message=FALSE, warning=FALSE}
dim(sceObj)
sceObj <- removeRareGenes(sceObj, nCells=10)
dim(sceObj)
```

# Representation analysis

The `repAnalysis` function shows which pairs selected from two categorical 
columns of a single-cell object are:

- Statistically overrepresented. This is the default setting.
- Statistically underrepresented. This requires setting `doOverrep` to `FALSE`.

```{r}
df <- repAnalysis(sceObj, 'donor', 'label')
head(df)
df <- repAnalysis(sceObj, 'donor', 'label', doOverrep=FALSE)
head(df)
```

The output of `repAnalysis` can be visualized using the `pvalRiverPlot` 
function, which creates an alluvial plot for a data frame
having a p-value column. Thicker connecting bands represent lower p-values:

```{r}
pvalRiverPlot(df, title=NULL)
```

The two columns can also be visualized using the `distributionPlot` function:

```{r}
distributionPlot(sceObj, NULL, 'donor', 'label')
```

The function can also plot relative percentages instead of raw counts:

```{r}
distributionPlot(sceObj, NULL, 'donor', 'label', type='percs')
```

# Computing centers of mass of gene expression

Given a list of genes, `geneCenters` can compute the center of mass of each
gene based on a dimensionality reduction of choice. By default, the reduction 
used by `geneCenters` is UMAP:

```{r message=FALSE, warning=FALSE}
genes <- c('PRSS1', 'TTR', 'GJD2', 'MS4A8')
geneCenters(sceObj, genes)
```

The centers of mass can be visualized with `genesDimPlot`, a function which
uses `geneCenters` internally:

```{r message=FALSE, warning=FALSE}
genesDimPlot(sceObj, genes, groupBy='label')
```

**Note**: The related `colsDimPlot` function can plot the center of mass of
numeric columns of the metadata of a Seurat object. `colsDimPlot` uses 
`colCenters` internally, which works similarly to `geneCenters`. Both 
`genesDimPlot` and `colsDimPlot` are wrappers around `pointsDimPlot`, which
adds labeled points on a plot generated with `scLang::dimPlot`.

# Computing silhouette and normalized silhouette

The `computeSilhouette` computes the silhouette for an input column in the
metadata or coldata of the single-cell expression object, returning the object
with an appended silhouette column:

```{r message=FALSE, warning=FALSE}
sceObj <- computeSilhouette(sceObj, 'label', 'labelSil')
head(scCol(sceObj, 'labelSil'))
```

The `normalizeSilhouette` function returns a data frame whose number of rows
equals the number of cells and the number of columns equals the number of 
identities in the single-cell expression object. It requires silhouette 
information for the required column to be provided in the metadata/coldata of 
the object. For each column in the output data frame, cells outside of the 
corresponding identity get a score of 0. All the other cells get scores in
(0, 1] by adjusted min-max normalization:

```{r message=FALSE, warning=FALSE}
df <- normalizeSilhouette(sceObj, 'label', 'labelSil')
head(df)
```

The `addNormSilhouette` function adds non-zero normalized silhouette 
information to the single-cell expression object as a metadata/coldata column.
For visualization purposes, we will use the `featurePlot` function from 
`scLang`:

```{r message=FALSE, warning=FALSE}
sceObj <- addNormSilhouette(sceObj, df, 'labelNormSil')
featurePlot(sceObj, 'labelNormSil')
```

# Session information {-}
```{r}
sessionInfo()
```