In a matrix, we look for columns that are friendly to a row

SI AUGUSTUS CERNATUR, CERNANTUR QOUQUE AMICI

18 May 2026

Introduction

There is a simple intuition of what it means to be a friend. A friend of Augustus cares about Augustus more than about other people. And, if we see Augustus, then we infer to see friends(s) of Augustus also. Please refer to our Arxive publication for more formal introduction.

Let’s picture a set of expression pattern, each represents a biological process. Techically, each pattern is a vector of expression loads of the involved genes.

Sometimes, the expression of a single gene in a sample indicates the activity of the entire pattern there. The simplest case: the gene has a nonzero load in only one pattern.

Actually, the gene may have several nonzero loads, but all of them but the one of interest are relatively small. Then the gene (AKA Augustus) is the marker gene for the pattern, and the marked pattern is the a friend of this gene.

There could be more then one friend for a marker. In this case, the marker is more important for the friends than for the non-friends.

We have two sets:T (rows) and C (columns) and A real matrix A(t,c) that describes the strength of relation between each t and each c; t is an element of T and c is an element of C.

The model applies to many problems in bioinformatics and statistics.

Example \(\left\{T\right\}\) \(\left\{C\right\}\) \(A\left(t,c\right)\)
gene regulation by TFs gene genes under the TF regulation strength of regulation
fuzzy clustering object cluster object weight in cluster
transcription decomposition transcript transcription pattern transcript’s load in pattern
weighted graph vertex another vertex weight of edge between column and row

For each row, we want to identify whather there are columns(s) that particularly prefer(s) the row, and list the columns. Then such a column is a friend (or the best friend if there is only one) for the row. We intend to express the meaning of the word “particularly” in this context by a statistical test, that is the friends.test provided by this package.

Let’s define the relevance of a row to a column as the rank of the \(i\)-th rows’s relation with \(j\)-the column inside the column, or, in our notation, the rank of \(A_{ij}\) in \(A_{*j}\), and denote it as \(r(t_i,c_j)\).

The main idea of the friends.test test is that if a row \(t_i\) is not preferred by any column(s), the distribution of the ranks of the row in all the columns \(r(t_i,c_j)\) behaves like \(j\) i.i.d integers that are uniformely distributed in \(1..n\).

Toy examples

Toy example: Bayesian option

Let’s illustrate the method with something supersimple.

``` r
regulation <-
    structure(
        c(
            0.2, 0.25, 0.1, 0.23, 0.3, 0.12, 0.14,
            4, 3, 6, 1, 9, 7, 10,
            3, 1, 5, 6, 3, 4, 11
        ),
        dim = c(7, 3),
        dimnames =
        list(
            c(
                "Gene1", "Gene2", "Gene3",
                "Gene4", "Gene5", "Gene6",
                "Gene7"
            ),
            c("TF1", "TF2", "TF3")
        )
    )
noquote(regulation)
```

```
##        TF1 TF2 TF3
## Gene1 0.20   4   3
## Gene2 0.25   3   1
## Gene3 0.10   6   5
## Gene4 0.23   1   6
## Gene5 0.30   9   3
## Gene6 0.12   7   4
## Gene7 0.14  10  11
```

What we want to identify is Gene2-TF1 pair. Gene2 is not the strongest downstream gene in the TF1 column (column), it is only the second, but, on the other hand its rank in other TF-based columns is much lower.

First, let’s apply the Bayesian version of the method, best.friends.bic.

``` r
friends.test.bic(regulation, prior.to.have.friends = .5, max.friends.n = 1)
```

```
## $Gene1
## $Gene1$TF1
## marker friend   rank 
##      1      1      1 
## 
## 
## $Gene2
## $Gene2$TF1
## marker friend   rank 
##      2      1      1
```

We see the expected Gene2-TF1 pair as well as the Gene1-TF1 pair, that is similar to the expected, Gene1 has rank of 3 in the TF1 column.

The lower is the prior to have friends for a row (it is an obligatory parameter for friends.test.bic), the less marker-friend pairs we get.

``` r
friends.test.bic(regulation, prior.to.have.friends = .33, max.friends.n = 1)
```

```
## $Gene2
## $Gene2$TF1
## marker friend   rank 
##      2      1      1
```

``` r
friends.test.bic(regulation, prior.to.have.friends = .25, max.friends.n = 1)
```

```
## named list()
```

In all the three examples above, the max.friends.n parameter was set to 1. It means that the function filters out all the rows that have more that one friendly column, or in other words, we look only for the best friends. If we relax the restriction, we get more results, and the rank that shows the rank of the friend for the marker, can differ from 1:

``` r
friends.test.bic(regulation, prior.to.have.friends = .25)
```

```
## $Gene7
## $Gene7$TF2
## marker friend   rank 
##      7      2      1 
## 
## $Gene7$TF3
## marker friend   rank 
##      7      3      2
```

Note that max.friends.n = "all" is the default value for both friends.test (see below) and friends.test.bic.

The same toy example: the KS option

Let’s run the p-value-based option of the method on the same data. It is less sensitive on small dataset, so we use unpractical threshold for the p-value to see the result.

``` r
friends.test(regulation, threshold = .5, p.adjust.method = "no",
             max.friends.n = 1)
```

```
## $Gene2
## $Gene2$TF1
## marker friend   rank 
##      2      1      1
```

And, as usual, if we relax the max.friends.no restriction, we will see more pairs, but now the friends are not required to be unique for a marker.

``` r
friends.test(regulation, threshold = .5, p.adjust.method = "no")
```

```
## $Gene2
## $Gene2$TF1
## marker friend   rank 
##      2      1      1 
## 
## 
## $Gene4
## $Gene4$TF3
## marker friend   rank 
##      4      3      1 
## 
## $Gene4$TF1
## marker friend   rank 
##      4      1      2 
## 
## 
## $Gene7
## $Gene7$TF2
## marker friend   rank 
##      7      2      1 
## 
## $Gene7$TF3
## marker friend   rank 
##      7      3      2
```

Iris example

Let’s look for anomalies in the traditional iris dataset:

``` r
data("iris")
print(head(iris, n = 5))
```

```
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
```

We use only numeric columns:

``` r
friends.test.bic(iris[1:4], prior.to.have.friends = .001)
```

```
## $`14`
## $`14`$Sepal.Width
## marker friend   rank 
##     14      2      1
```

So, we see that the flower #14 is important for (we also can think about it as “popular in”) the Sepal.Width nomination. Let’s understand it a simple request: what is the quantile of the flower #14 in the four columns.

``` r
apply(iris[, 1:4], 2, function(col) ecdf(col)(col[14]))
```

```
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##  0.006666667  0.553333333  0.013333333  0.033333333
```

So, flower #14 is in top 50% in Sepal.Width and in lowest 5% for all other sizes. It is exacly what we describe as “Sepal.Width is friend for flower #14 and the flower #14 is a marker for Sepal.Width”.

Let’s note that the four sizes have different distributions; still, the friends.test naturally works with this type of data.

CoGAPS feature loadings example

NATURAM EXPELLAS FURCA, TAMEN USQUE RECURRET

CoGAPS decomposes an expression dataset into pattetns of expression, ideally, eah of them represent an underlying biological process. We use the PDAC dataset from the Johnson et al, 2023 as an example.

``` r
data(friends.test.cogaps.example)
```

Feature loadings in CoGAPS are normalized expression values across patterns. Each column is a pattern, each row is a gene.

``` r
featureLoadings <- friends.test.cogaps.example$loadings
head(featureLoadings)
```

```
##           Pattern_1   Pattern_2   Pattern_3   Pattern_4    Pattern_5    Pattern_6  Pattern_7
## SAMD11  0.002373853 0.002656909 0.006061856 0.001842319 0.0007711356 0.0005375241 0.01290628
## NOC2L   0.333290756 0.215134636 0.680900455 0.083199978 0.1951790750 0.1346961260 0.18365334
## KLHL17  0.055154882 0.005238459 0.047037147 0.013550087 0.0049433918 0.0671307519 0.01745793
## PLEKHN1 0.070879407 0.160753518 0.002134236 0.002048471 0.0000000000 0.4884256721 0.10620824
## HES4    0.345203698 0.139899164 0.149048895 0.026502691 1.6740946770 1.3930050135 0.18173441
## ISG15   0.012890226 1.408829808 0.849508107 0.025504649 0.1476552784 1.0514144897 1.27441263
##           Pattern_8
## SAMD11  0.001200248
## NOC2L   0.018304676
## KLHL17  0.016975854
## PLEKHN1 0.075609177
## HES4    0.267374903
## ISG15   0.445318580
```

Friends test application to the loading matrix

Let’s look for marker genes for the patterns using friends.test. In this setup, genes (possibly, markers) are rows, patterns (possibly, friends for the markers) are columns.

We use here the default version of friends.test that selects markers by the KS test. As far ad there is ~15000 genes, we use Bonferroni p=value adjustment rather than the default “BH”.

``` r
friends_of_genes <- friends.test(
    featureLoadings,
    p.adjust.method = "bonferroni"
)
```

Return structure

The return is a list of lists, the outer list is by gene that is a marker, the inner is by pattern that is the marker’s friend. The output format allows a marker to have a lot of friends. The leave is a 3-integer vector, are the marker index and the friend index in the matrix we started with, and the rank of the friend for the marker. Let’s look at the first element, it’s rank is 1 as far as it is the first.

``` r
friends_of_genes[[1]][[1]]
```

```
## marker friend   rank 
##    232      5      1
```

The names of the list elements are in accordance. The first marker name:

``` r
rownames(featureLoadings)[friends_of_genes[[1]][[1]]["marker"]]
```

```
## [1] "RPL11"
```

``` r
names(friends_of_genes)[1]
```

```
## [1] "RPL11"
```

The name of the first friend of the first marker:

``` r
colnames(featureLoadings)[friends_of_genes[[1]][[1]]["friend"]]
```

```
## [1] "Pattern_5"
```

``` r
names(friends_of_genes[[1]])[1]
```

```
## [1] "Pattern_5"
```

The format is flexible enough, we exemplify posiible transformations later. The format is common for the friends.test and friends.test.bic main functions.

Ranks of a marker in its friends and non-friends

Let’s look what happens in the friends.test. First, we rank all the genes inside the loadings matrix.

``` r
ranks <- row.int.ranks(featureLoadings)
head(ranks)
```

```
##         Pattern_1 Pattern_2 Pattern_3 Pattern_4 Pattern_5 Pattern_6 Pattern_7 Pattern_8
## SAMD11      12846     11721     10876     11106     11863     12096     10495     12220
## NOC2L        2093      4244      1081      3785      4573      5654      2866      9268
## KLHL17       8678     11147      8186      8078     10935      7095      9920      9361
## PLEKHN1      8048      5169     11974     11012     15128      2203      4717      6885
## HES4         1980      5613      5304      6617       346       486      2905      3742
## ISG15       11049       390       751      6698      5515       725       196      2293
```

The ranks of the first marker in all he patterns:

``` r
marker_1 <- names(friends_of_genes)[1]
ranks[marker_1, ]
```

```
## Pattern_1 Pattern_2 Pattern_3 Pattern_4 Pattern_5 Pattern_6 Pattern_7 Pattern_8 
##      7599        80       149        31        12        48        86        92
```

The ranks of the first in all the friend patterns:

``` r
ranks[marker_1, names(friends_of_genes[[1]])]
```

```
## Pattern_5 Pattern_4 Pattern_6 Pattern_2 Pattern_7 Pattern_8 Pattern_3 
##        12        31        48        80        86        92       149
```

The model splitted the patterns into two sets: Pattern_1 where the marker has a very low rank, and the reaminder. It is what we supposed to be done.

To look for genes that have only one friend (let’s call them exclusive markers), so we use max.friends.n = 1. The parameter actually is involved after the main part of the calculation, it filters out the genes that has more than max.friends.n friends, these genes are not stored in the output.

``` r
best_friends_of_genes <- friends.test(
    featureLoadings,
    p.adjust.method = "bonferroni",
    max.friends.n = 1
)
names(best_friends_of_genes)
```

```
## [1] "CA4"
```

In this super-restrictive mode, only one (exclusive) marker, CA4, is found. Its friend is the Pattern_5. It is to the only by the task.

``` r
the_marker <- names(best_friends_of_genes)[1]
ranks[the_marker, ]
```

```
## Pattern_1 Pattern_2 Pattern_3 Pattern_4 Pattern_5 Pattern_6 Pattern_7 Pattern_8 
##     15098     14889     14789     14885       632     14794     15081     14654
```

That is what the friends.test is a unique test to detect: the CA4 has a moderate rank in Pattern_5, but it is the best of its ranks in all other patterns and it is significantlu higher than all of them.

Gene set enrichment

To perform an example of gene set enrichment analysis, we use the Baesian version of the test, friends.test.bic. We restrict the number of friends for a marker by 4, a half of number of the patterns, to keep them all in the similar “direction”. The reqired parameter prior.to.have.friends is set to approximately 1/(number od genes). We run 4 parallel processses, see Parallel calculation.

``` r
mirai::daemons(4)
good_friends_of_genes <- friends.test.bic(
    featureLoadings,
    prior.to.have.friends = 1E-4,
    max.friends.n = 4
)
mirai::daemons(0)
```

The result format transfromation

For the gene overrrepresentation analysis, we need to gen a list of markers (genes) for each pttern. So, we swap the levels of the list, and we do not need leaves. Here, we can reverse the list via a data frame. It works if the number of elements do not reach hundred of thousands.

``` r
good_friends_df <- good_friends_of_genes |>
    purrr::flatten() |>
    purrr::reduce(.f = rbind) |>
    data.frame() |>
    tibble::remove_rownames() |>
    dplyr::mutate(
        gene = rownames(featureLoadings)[marker],
        pattern = colnames(featureLoadings)[friend]
    )
head(good_friends_df)
```

```
##   marker friend rank     gene   pattern
## 1     62      5    1    AJAP1 Pattern_5
## 2     62      1    2    AJAP1 Pattern_1
## 3     76      6    1   TAS1R1 Pattern_6
## 4     92      6    1      CA6 Pattern_6
## 5    112      5    1 C1orf127 Pattern_5
## 6    117      5    1  ANGPTL7 Pattern_5
```

Now, let’s prepare a list of markers by pattern.

``` r
good_markers_list <- good_friends_df |>
    split(good_friends_df$pattern) |>
    purrr::map(~ .x |>
            dplyr::pull(gene) |>
            unique() |>
            sort()
    )
```

Prepare the gene collections

As far as we have gene lists, let’s examine them for overreprentation in the gene collections from MSigDB. We use fgsea and msigdbr for that.

First, prepare the collections sets for H (Hallmarks) and for for C4:3CA (Curated Cancer Cell Atlas).

``` r
msigdbr_H_df <- msigdbr::msigdbr(species = "human", collection = "H")
collections_H <- split(
    msigdbr_H_df$gene_symbol,
    msigdbr_H_df$gs_name
)
collections_H <- split(
    msigdbr_H_df$gene_symbol,
    msigdbr_H_df$gs_name
)
msigdbr_C4_3CA_df <-
    msigdbr::msigdbr(
        species = "human",
        collection = "C4",
        subcollection = "3CA"
    )
collections_C4_3CA <- split(
    msigdbr_C4_3CA_df$gene_symbol,
    msigdbr_C4_3CA_df$gs_name
)
```

Now, let’s test the gene enrichment of the markers for each pattern in the Curated Cancer Cell Atlas and in Hallmarks MSigDB collections.

``` r
C4_3CA_by_markers <- good_markers_list |>
    purrr::map(\(markers) {
        fgsea::fora(
            pathways = collections_C4_3CA,
            markers,
            rownames(featureLoadings)
        ) |>
            dplyr::filter(padj < 0.05) |>
            dplyr::select(pathway, padj)
    })
C4_3CA_by_markers
```

```
## $Pattern_1
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_2
##                                             pathway       padj
##                                              <char>      <num>
## 1:  GAVISH_3CA_METAPROGRAM_EPITHELIAL_COLON_RELATED 0.01053558
## 2: GAVISH_3CA_METAPROGRAM_EPITHELIAL_PDAC_RELATED_2 0.01053558
## 
## $Pattern_3
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_4
##                                             pathway       padj
##                                              <char>      <num>
## 1:    GAVISH_3CA_METAPROGRAM_EPITHELIAL_RESPIRATION 0.03434735
## 2: GAVISH_3CA_METAPROGRAM_EPITHELIAL_PDAC_RELATED_2 0.03434735
## 
## $Pattern_5
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_6
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_7
##                                                      pathway         padj
##                                                       <char>        <num>
## 1:            GAVISH_3CA_MALIGNANT_METAPROGRAM_23_SECRETED_2 0.0003684984
## 2:                GAVISH_3CA_METAPROGRAM_EPITHELIAL_SECRETED 0.0003684984
## 3: GAVISH_3CA_MALIGNANT_METAPROGRAM_19_EPITHELIAL_SENESCENCE 0.0072089630
## 4:                  GAVISH_3CA_METAPROGRAM_EPITHELIAL_EPISEN 0.0454684340
## 5:                GAVISH_3CA_METAPROGRAM_EPITHELIAL_ALVEOLAR 0.0454684340
## 6:          GAVISH_3CA_METAPROGRAM_EPITHELIAL_PDAC_RELATED_2 0.0454684340
## 7:         GAVISH_3CA_METAPROGRAM_FIBROBLASTS_METAL_RESPONSE 0.0454684340
## 
## $Pattern_8
##                                                      pathway       padj
##                                                       <char>      <num>
## 1: GAVISH_3CA_MALIGNANT_METAPROGRAM_19_EPITHELIAL_SENESCENCE 0.01266589
## 2:                  GAVISH_3CA_METAPROGRAM_EPITHELIAL_EPISEN 0.03366002
## 3:                GAVISH_3CA_METAPROGRAM_EPITHELIAL_SECRETED 0.03366002
## 4:          GAVISH_3CA_METAPROGRAM_EPITHELIAL_PDAC_RELATED_2 0.03366002
```



``` r
hallmarks_by_markers <- good_markers_list |>
    purrr::map(\(markers) {
        fgsea::fora(
            pathways = collections_H,
            markers,
            rownames(featureLoadings)
        ) |>
            dplyr::filter(padj < 0.05) |>
            dplyr::select(pathway, padj)
    })
hallmarks_by_markers
```

```
## $Pattern_1
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_2
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_3
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_4
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_5
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_6
##                       pathway       padj
##                        <char>      <num>
## 1: HALLMARK_KRAS_SIGNALING_DN 0.02078762
## 
## $Pattern_7
## Empty data.table (0 rows and 2 cols): pathway,padj
## 
## $Pattern_8
## Empty data.table (0 rows and 2 cols): pathway,padj
```

Rakning and random numbers

The result of both friends.test and friends.test.bic functions depends on random state generator. The dependency is because the test starts with ranking of the elemants of the input matrix in the columns. The test requires the rank values to be integer, so the ties are resolved randomly. It you want a reproduceablle result, keep the initial random generator state. Another option is to rank the matrix bfore the test call and then to pass the rank matrix as the input. It does not consume a lot of additional time, and the renking of already-ranked does not depend on random seeds. The special case when this approach is the only is if some additional information is used in the ranking.

Parallel calculation

Both main functions, friends.test and friends.test.bic, carries purrr::in_parallel code, so if you want each of them to run in parallel mode, say mirai::deamons(n) (n is number of processes) in your R code before the finction call, and mirai::daemons(0) after. See mirai - Quick Reference for more information.

Session info

``` r
sessionInfo()
```

```
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] friends.test_0.99.18 markdown_2.0         knitr_1.51          
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.10         generics_0.1.4      fgsea_1.38.0        lattice_0.22-9     
##  [5] digest_0.6.39       magrittr_2.0.5      evaluate_1.0.5      grid_4.6.0         
##  [9] RColorBrewer_1.1-3  fastmap_1.2.0       Matrix_1.7-5        jsonlite_2.0.0     
## [13] purrr_1.2.2         scales_1.4.0        codetools_0.2-20    jquerylib_0.1.4    
## [17] cli_3.6.6           rlang_1.2.0         rtlr_0.1.0          cowplot_1.2.0      
## [21] withr_3.0.2         cachem_1.1.0        yaml_2.3.12         otel_0.2.0         
## [25] tools_4.6.0         parallel_4.6.0      BiocParallel_1.46.0 dplyr_1.2.1        
## [29] nanonext_1.9.0      ggplot2_4.0.3       carrier_0.3.0.4     fastmatch_1.1-8    
## [33] curl_7.1.0          assertthat_0.2.1    babelgene_22.9      vctrs_0.7.3        
## [37] R6_2.6.1            lifecycle_1.0.5     pkgconfig_2.0.3     pillar_1.11.1      
## [41] bslib_0.11.0        gtable_0.3.6        data.table_1.18.4   glue_1.8.1         
## [45] Rcpp_1.1.1-1.1      xfun_0.57           tibble_3.3.1        tidyselect_1.2.1   
## [49] dichromat_2.0-0.1   farver_2.1.2        msigdbr_26.1.0      htmltools_0.5.9    
## [53] mirai_2.7.0         rmarkdown_2.31      compiler_4.6.0      S7_0.2.2
```