Exploration of genetically modified organisms, developmental processes, diseases or responses to various treatments require accurate measurement of changes in gene expression. This can be done for thousands of genes using high throughput technologies such as microarray and RNAseq. However, identification of differentially expressed (DE) genes poses technical challenges due to limited sample size, few replicates, or simply very small changes in expression levels. Consequently, several methods have been developed to determine DE genes, such as Limma, edgeR, and DESeq2. These methods identify DE genes based on the expression levels alone. As genomic co-localization of genes is generally not linked to co-expression, we deduced that DE genes could be detected with the help of genes from chromosomal neighbourhood. Here, we present a new method, DELocal, which identifies DE genes by comparing their expression changes to changes in adjacent genes in their chromosomal regions.
In the above figure it can be seen that Sostdc1 is differentially expressed in developing tooth tissues (E13 and E14). DELocal helps in identifying similar genes.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("delocal")
To install from github
if (!requireNamespace("devtools")) {
install.packages("devtools")
}
devtools::install_github("dasroy/delocal")
This is a basic example which shows you how to use DELocal:
First a SummarizedExperiment object will be configured with gene expression count matrix and gene location info.
library(DELocal)
count_matrix <- as.matrix(read.table(file = system.file("extdata",
"tooth_RNASeq_counts.txt",
package = "DELocal")))
colData <- data.frame(condition=gsub("\\..*",x=colnames(count_matrix),
replacement = ""))
Example of required gene location information
gene_location <- read.table(file = system.file("extdata",
"gene_location.txt",
package = "DELocal"))
head(gene_location)
#> ensembl_gene_id start_position chromosome_name
#> ENSMUSG00000000001 ENSMUSG00000000001 108107280 3
#> ENSMUSG00000000003 ENSMUSG00000000003 77837901 X
#> ENSMUSG00000000028 ENSMUSG00000000028 18780447 16
#> ENSMUSG00000000031 ENSMUSG00000000031 142575529 7
#> ENSMUSG00000000037 ENSMUSG00000000037 161082525 X
#> ENSMUSG00000000049 ENSMUSG00000000049 108343354 11
library(biomaRt)
gene_attributes <- c("ensembl_gene_id", "start_position", "chromosome_name")
ensembl_ms_mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",
dataset="mmusculus_gene_ensembl", host="www.ensembl.org")
#> Warning: Ensembl will soon enforce the use of https.
#> Ensure the 'host' argument includes "https://"
gene_location_sample <- getBM(attributes=gene_attributes, mart=ensembl_ms_mart,
verbose = FALSE)
rownames(gene_location_sample) <- gene_location_sample$ensembl_gene_id
library(SummarizedExperiment)
smrExpt <- SummarizedExperiment(assays=list(counts=count_matrix),
rowData = gene_location,
colData=colData)
smrExpt
#> class: SummarizedExperiment
#> dim: 52183 14
#> metadata(0):
#> assays(1): counts
#> rownames(52183): ENSMUSG00000000001 ENSMUSG00000000003 ...
#> ENSMUSG00000114967 ENSMUSG00000114968
#> rowData names(3): ensembl_gene_id start_position chromosome_name
#> colnames(14): ME14.E1M1R ME14.E2M1R ... ME13.E9M1R ME13.EXM1L
#> colData names(1): condition
These may take long time to run the whole data therefore here we will analyse genes only from X chromosome. Here in this example DELocal compares each gene with 5 ‘nearest_neighbours’ and returns only genes whose adjusted p-value is less than pValue_cut.
library(dplyr)
x_genes <- SummarizedExperiment::rowData(smrExpt) %>%
as.data.frame() %>%
filter(chromosome_name=="X") %>% rownames()
DELocal_result <- DELocal(pSmrExpt = smrExpt[x_genes,],
nearest_neighbours = 5,pDesign = ~ condition,
pValue_cut = 0.05)
head(round(DELocal_result,digits = 9))
#> relative.logFC P.Value adj.P.Val B
#> ENSMUSG00000037217 508.8688 0.00e+00 0.000000091 15.033054
#> ENSMUSG00000062184 -694.6147 4.07e-07 0.000267921 4.970833
#> ENSMUSG00000059401 140.0683 4.35e-07 0.000267921 4.899633
#> ENSMUSG00000016319 955.2564 4.85e-07 0.000267921 4.781442
#> ENSMUSG00000045103 417.2448 5.18e-07 0.000267921 4.711055
#> ENSMUSG00000031103 370.1563 6.98e-07 0.000300492 4.392911
The results are already sorted in ascending order of adjusted p-value (adj.P.Val)
plotNeighbourhood function can be used to plot median expressions of different ‘condition’ for a gene of interest and its pNearest_neighbours genes.
DELocal::plotNeighbourhood(pSmrExpt = smrExpt, pGene_id = "ENSMUSG00000059401",
pNearest_neighbours=5, pDesign = ~ condition)$plot
In previous example 1 Mb chromosomal area around each gene to define its neighbourhood. The choice of 1Mb window is obviously somewhat arbitrary. However it is also possible to use different size of neighbourhood for each gene. For that user can provide “neighbors_start” and “neighbors_end” for each gene in the rowData.
To demonstrate this, TADs (topologically associating domains) boundaries which are different for each gene are used next.
gene_location_dynamicNeighbourhood <- read.csv(system.file("extdata", "Mouse_TAD_boundaries.csv",
package = "DELocal"))
rownames(gene_location_dynamicNeighbourhood) <-
gene_location_dynamicNeighbourhood$ensembl_gene_id
# rename the columns as required by DELocal
colnames(gene_location_dynamicNeighbourhood)[4:5] <- c("neighbors_start",
"neighbors_end")
common_genes <- intersect(rownames(count_matrix),
rownames(gene_location_dynamicNeighbourhood) )
smrExpt_dynamicNeighbour <-
SummarizedExperiment::SummarizedExperiment(
assays = list(counts = count_matrix[common_genes,]),
rowData = gene_location_dynamicNeighbourhood[common_genes, ],
colData = colData
)
## Selecting only chromosome 1 genes to reduce the runtime
# one_genes <- SummarizedExperiment::rowData(smrExpt_dynamicNeighbour) %>%
# as.data.frame() %>%
# filter(chromosome_name=="1") %>% rownames()
DELocal_result_tad <- DELocal(pSmrExpt = smrExpt_dynamicNeighbour[x_genes,],
nearest_neighbours = 5,pDesign = ~ condition,
pValue_cut = 0.05, pLogFold_cut = 0)
head(DELocal_result_tad)
#> relative.logFC P.Value adj.P.Val B
#> ENSMUSG00000037217 508.8688 3.520588e-11 9.097200e-08 15.036315
#> ENSMUSG00000062184 -694.6147 4.068015e-07 2.679164e-04 4.974179
#> ENSMUSG00000059401 140.0683 4.347483e-07 2.679164e-04 4.902978
#> ENSMUSG00000016319 955.2564 4.854489e-07 2.679164e-04 4.784788
#> ENSMUSG00000045103 417.2448 5.184140e-07 2.679164e-04 4.714401
#> ENSMUSG00000031103 370.1563 6.977232e-07 3.004861e-04 4.396258
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 SummarizedExperiment_1.32.0
#> [3] Biobase_2.62.0 GenomicRanges_1.54.1
#> [5] GenomeInfoDb_1.38.8 IRanges_2.36.0
#> [7] S4Vectors_0.40.2 BiocGenerics_0.48.1
#> [9] MatrixGenerics_1.14.0 matrixStats_1.2.0
#> [11] biomaRt_2.58.2 DELocal_1.2.1
#> [13] BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.2 bitops_1.0-7 rlang_1.1.3
#> [4] magrittr_2.0.3 compiler_4.3.3 RSQLite_2.3.5
#> [7] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4
#> [10] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.2
#> [13] fastmap_1.1.1 magick_2.8.3 dbplyr_2.5.0
#> [16] XVector_0.42.0 labeling_0.4.3 utf8_1.2.4
#> [19] rmarkdown_2.26 purrr_1.0.2 bit_4.0.5
#> [22] xfun_0.43 zlibbioc_1.48.2 cachem_1.0.8
#> [25] jsonlite_1.8.8 progress_1.2.3 blob_1.2.4
#> [28] highr_0.10 DelayedArray_0.28.0 BiocParallel_1.36.0
#> [31] parallel_4.3.3 prettyunits_1.2.0 R6_2.5.1
#> [34] bslib_0.6.2 stringi_1.8.3 limma_3.58.1
#> [37] jquerylib_0.1.4 Rcpp_1.0.12 bookdown_0.38
#> [40] knitr_1.45 Matrix_1.6-5 tidyselect_1.2.1
#> [43] abind_1.4-5 yaml_2.3.8 codetools_0.2-19
#> [46] curl_5.2.1 lattice_0.22-6 tibble_3.2.1
#> [49] plyr_1.8.9 withr_3.0.0 KEGGREST_1.42.0
#> [52] evaluate_0.23 BiocFileCache_2.10.2 xml2_1.3.6
#> [55] Biostrings_2.70.3 pillar_1.9.0 BiocManager_1.30.22
#> [58] filelock_1.0.3 generics_0.1.3 RCurl_1.98-1.14
#> [61] hms_1.1.3 ggplot2_3.5.0 munsell_0.5.0
#> [64] scales_1.3.0 glue_1.7.0 tools_4.3.3
#> [67] locfit_1.5-9.9 XML_3.99-0.16.1 grid_4.3.3
#> [70] AnnotationDbi_1.64.1 colorspace_2.1-0 GenomeInfoDbData_1.2.11
#> [73] cli_3.6.2 rappdirs_0.3.3 fansi_1.0.6
#> [76] S4Arrays_1.2.1 gtable_0.3.4 DESeq2_1.42.1
#> [79] sass_0.4.9 digest_0.6.35 SparseArray_1.2.4
#> [82] farver_2.1.1 memoise_2.0.1 htmltools_0.5.8
#> [85] lifecycle_1.0.4 httr_1.4.7 statmod_1.5.0
#> [88] bit64_4.0.5