--- title: "Introduction to hammers" author: "Andrei-Florian Stoica" package: hammers date: March 6, 2026 output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Getting started with CSOA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction `hammers` is a utilities suite for scRNA-seq data analysis. It provides simple tools to address tasks such as retrieving aggregate gene statistics, finding and removing rare genes, performing representation analysis, computing the center of mass for the expression of a gene of interest in low-dimensional space, and calculating silhouette and cluster-normalized silhouette. # Installation To install hammers, run the following commands in an R session: ```{r setup, eval=FALSE} if (!require("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("hammers") ``` # Prerequisites In addition to hammers, you need to install [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) and [scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html) for this tutorial. # Loading and preparing data This tutorial uses an scRNA-seq human pancreas dataset. After loading the required packages, download the dataset using the `BaronPancreasData` function from `scRNAseq`. The dataset will be stored as a SingleCellExperiment object. ```{r message=FALSE, warning=FALSE, results=FALSE} library(hammers) library(scLang) library(scRNAseq) library(scater) sceObj <- BaronPancreasData('human') ``` We will first normalize and log-transform the data using the `logNormCounts` function from `scuttle` (loaded automatically with `scater`). ```{r message=FALSE, warning=FALSE, results=FALSE} sceObj <- logNormCounts(sceObj) ``` We will also need PCA and UMAP dimensions. These can be computed using the `runPCA` and `runUMAP` functions from `scater`: ```{r message=FALSE, warning=FALSE, results=FALSE} sceObj <- runPCA(sceObj) sceObj <- runUMAP(sceObj) ``` # Extracting gene information The `genePresence` function extracts the `genes` that appear in at least `minCutoff` and at most `maxCutoff` cells. It returns a data frame listing the number of cells in which each gene that meets these criteria is expressed: ```{r} df <- genePresence(sceObj, rownames(sceObj)[seq(100)], 300, 2000) dim(df) head(df) ``` The `geneCellSets` function lists all the cells in which selected genes are expressed: ```{r} cellSets <- geneCellSets(sceObj, rownames(sceObj)[seq(100)]) head(cellSets[[1]]) ``` `findRareGenes` is a wrapper around `genePresence` that can extract the genes that appear in fewer than `nCells` cells: ```{r} df <- findRareGenes(sceObj, nCells=10) dim(df) head(df) ``` The `removeRareGenes` function calls `findRareGenes` and subsequently removes the identified rare genes from the object: ```{r message=FALSE, warning=FALSE} dim(sceObj) sceObj <- removeRareGenes(sceObj, nCells=10) dim(sceObj) ``` # Representation analysis The `repAnalysis` function shows which pairs selected from two categorical columns of a single-cell object are: - Statistically overrepresented. This is the default setting. - Statistically underrepresented. This requires setting `doOverrep` to `FALSE`. ```{r} df <- repAnalysis(sceObj, 'donor', 'label') head(df) df <- repAnalysis(sceObj, 'donor', 'label', doOverrep=FALSE) head(df) ``` The output of `repAnalysis` can be visualized using the `pvalRiverPlot` function, which creates an alluvial plot for a data frame having a p-value column. Thicker connecting bands represent lower p-values: ```{r} pvalRiverPlot(df, title=NULL) ``` The two columns can also be visualized using the `distributionPlot` function: ```{r} distributionPlot(sceObj, NULL, 'donor', 'label') ``` The function can also plot relative percentages instead of raw counts: ```{r} distributionPlot(sceObj, NULL, 'donor', 'label', type='percs') ``` # Computing centers of mass of gene expression Given a list of genes, `geneCenters` can compute the center of mass of each gene based on a dimensionality reduction of choice. By default, the reduction used by `geneCenters` is UMAP: ```{r message=FALSE, warning=FALSE} genes <- c('PRSS1', 'TTR', 'GJD2', 'MS4A8') geneCenters(sceObj, genes) ``` The centers of mass can be visualized with `genesDimPlot`, a function which uses `geneCenters` internally: ```{r message=FALSE, warning=FALSE} genesDimPlot(sceObj, genes, groupBy='label') ``` **Note**: The related `colsDimPlot` function can plot the center of mass of numeric columns of the metadata of a Seurat object. `colsDimPlot` uses `colCenters` internally, which works similarly to `geneCenters`. Both `genesDimPlot` and `colsDimPlot` are wrappers around `pointsDimPlot`, which adds labeled points on a plot generated with `scLang::dimPlot`. # Computing silhouette and normalized silhouette The `computeSilhouette` computes the silhouette for an input column in the metadata or coldata of the single-cell expression object, returning the object with an appended silhouette column: ```{r message=FALSE, warning=FALSE} sceObj <- computeSilhouette(sceObj, 'label', 'labelSil') head(scCol(sceObj, 'labelSil')) ``` The `normalizeSilhouette` function returns a data frame whose number of rows equals the number of cells and the number of columns equals the number of identities in the single-cell expression object. It requires silhouette information for the required column to be provided in the metadata/coldata of the object. For each column in the output data frame, cells outside of the corresponding identity get a score of 0. All the other cells get scores in (0, 1] by adjusted min-max normalization: ```{r message=FALSE, warning=FALSE} df <- normalizeSilhouette(sceObj, 'label', 'labelSil') head(df) ``` The `addNormSilhouette` function adds non-zero normalized silhouette information to the single-cell expression object as a metadata/coldata column. For visualization purposes, we will use the `featurePlot` function from `scLang`: ```{r message=FALSE, warning=FALSE} sceObj <- addNormSilhouette(sceObj, df, 'labelNormSil') featurePlot(sceObj, 'labelNormSil') ``` # Session information {-} ```{r} sessionInfo() ```