GDSArray is
a Bioconductor package that represents GDS files as objects
derived from the DelayedArray
package and DelayedArray class. It converts a GDS node in
the file to a DelayedArray-derived data structure. The rich
common methods and data operations defined on GDSArray
makes it more R-user-friendly than working with the GDS file
directly.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GDSArray")The Bioconductor package gdsfmt has provided a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which is designed for large-scale datasets, especially for data which are much larger than the available random-access memory.
More details about GDS format can be found in the vignettes of the gdsfmt, SNPRelate, and SeqArray packages.
GDSArray, GDSMatrix, and
GDSFileGDSArray represents GDS files as
DelayedArray instances. It has methods like
dim, dimnames defined, and it inherits
array-like operations and methods from DelayedArray, e.g.,
the subsetting method of [.
The GDSArray() constructor takes as arguments the file
path and the GDS node inside the GDS file. The GDSArray()
constructor always returns the object with rows being features (genes /
variants / snps) and the columns being “samples”. This is consistent
with the assay data inside SummarizedExperiment.
FIXME: should GDSArray() return that dim?
## This is a SeqArray GDS file
## <2 x 90 x 1348> GDSArray object of type "integer":
## ,,1
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 3 3 0 3 . 0 0 0 0
## [2,] 3 3 0 3 . 0 0 0 0
##
## ,,2
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 3 3 0 3 . 0 0 0 0
## [2,] 3 3 0 3 . 0 0 0 0
##
## ...
##
## ,,1347
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 0 0 0 0 . 0 0 0 0
## [2,] 0 0 0 0 . 0 0 0 0
##
## ,,1348
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 3 3 0 3 . 3 3 3 3
## [2,] 3 3 1 3 . 3 3 3 3
A GDSMatrix is a 2-dimensional GDSArray,
and will be returned from the GDSArray() constructor
automatically if the input GDS node is 2-dimensional.
## <90 x 1348> GDSMatrix object of type "integer":
## [,1] [,2] [,3] [,4] ... [,1345] [,1346] [,1347] [,1348]
## [1,] 0 0 12 15 . 6 5 4 0
## [2,] 0 0 17 4 . 10 8 7 0
## [3,] 107 92 247 177 . 28 15 26 3
## ... . . . . . . . . .
## [88,] 81 84 217 110 . 36 61 92 0
## [89,] 67 47 134 111 . 46 57 71 2
## [90,] 156 150 417 195 . 78 101 144 2
GDSFileThe GDSFile is a light-weight class to represent GDS
files. It has the $ completion method to complete any
possible gds nodes. It could be used as a convenient
GDSArray constructor if the slot of
current_path in GDSFile object represents a
valid gds node. Otherwise, it will return the GDSFile
object with an updated current_path.
## class: GDSFile
## file: /github/workspace/pkglib/SeqArray/extdata/CEU_Exon.gds
## current node: annotation/info
## subnodes:
## annotation/info/AA
## annotation/info/AC
## annotation/info/AN
## annotation/info/DP
## annotation/info/HM2
## annotation/info/HM3
## annotation/info/OR
## annotation/info/GP
## annotation/info/BN
## <1348> GDSArray object of type "integer":
## [1] [2] [3] [4] . [1345] [1346] [1347] [1348]
## 4 1 6 128 . 2 11 1 1
Try typing in gf$ann and pressing tab key
for the completion.
GDSArray methodsseed returns the GDSArraySeed of the
GDSArray object.## GDSArraySeed
## GDS File path: /github/workspace/pkglib/SeqArray/extdata/CEU_Exon.gds
## Array node: genotype/data
## Dim: 2 x 90 x 1348
gdsfile returns the file path of the corresponding GDS
file.## [1] "/github/workspace/pkglib/SeqArray/extdata/CEU_Exon.gds"
gdsnodes() takes the GDS file path or
GDSFile object as input, and returns all nodes that can be
converted to GDSArray instances. The returned GDS node
names can be used as input for the GDSArray(name=)
constructor.
## [1] "sample.id" "variant.id"
## [3] "position" "chromosome"
## [5] "allele" "genotype/data"
## [7] "genotype/~data" "genotype/extra.index"
## [9] "genotype/extra" "phase/data"
## [11] "phase/~data" "phase/extra.index"
## [13] "phase/extra" "annotation/id"
## [15] "annotation/qual" "annotation/filter"
## [17] "annotation/info/AA" "annotation/info/AC"
## [19] "annotation/info/AN" "annotation/info/DP"
## [21] "annotation/info/HM2" "annotation/info/HM3"
## [23] "annotation/info/OR" "annotation/info/GP"
## [25] "annotation/info/BN" "annotation/format/DP/data"
## [27] "annotation/format/DP/~data" "sample.annotation/family"
## [1] TRUE
## <1348> GDSArray object of type "integer":
## [1] [2] [3] [4] . [1345] [1346] [1347] [1348]
## 1 2 3 4 . 1345 1346 1347 1348
dim(), dimnames()The dimnames(GDSArray) returns an unnamed list, with the
value of NULL or dimension names with length being the same as return
from dim(GDSArray).
## [1] 90 1348
## [1] "list"
## [1] 0 0
[ subsettingGDSArray instances can be subset, following the usual
R conventions, with numeric or logical vectors; logical vectors
are recycled to the appropriate length.
## <3 x 6> DelayedMatrix object of type "integer":
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 59 49 88 55 46 47
## [2,] 33 22 16 9 7 7
## [3,] 276 271 145 89 70 151
## <45 x 1348> DelayedMatrix object of type "integer":
## [,1] [,2] [,3] [,4] ... [,1345] [,1346] [,1347] [,1348]
## [1,] 0 0 12 15 . 6 5 4 0
## [2,] 107 92 247 177 . 28 15 26 3
## [3,] 0 0 17 0 . 4 4 4 0
## ... . . . . . . . . .
## [43,] 0 0 11 5 . 3 3 1 0
## [44,] 3 4 9 2 . 4 3 4 0
## [45,] 67 47 134 111 . 46 57 71 2
## <90 x 1348> DelayedMatrix object of type "double":
## [,1] [,2] [,3] ... [,1347] [,1348]
## [1,] -Inf -Inf 2.484907 . 1.386294 -Inf
## [2,] -Inf -Inf 2.833213 . 1.945910 -Inf
## [3,] 4.672829 4.521789 5.509388 . 3.258097 1.098612
## ... . . . . . .
## [88,] 4.394449 4.430817 5.379897 . 4.5217886 -Inf
## [89,] 4.204693 3.850148 4.897840 . 4.2626799 0.6931472
## [90,] 5.049856 5.010635 6.033086 . 4.9698133 0.6931472
## <52 x 1348> DelayedMatrix object of type "integer":
## [,1] [,2] [,3] [,4] ... [,1345] [,1346] [,1347] [,1348]
## [1,] 0 0 12 15 . 6 5 4 0
## [2,] 0 0 17 4 . 10 8 7 0
## [3,] 0 0 11 1 . 3 1 1 0
## ... . . . . . . . . .
## [50,] 0 0 6 0 . 2 0 0 0
## [51,] 0 0 11 5 . 3 3 1 0
## [52,] 3 4 9 2 . 4 3 4 0
GDSArraySeedThe GDSArraySeed class represents the ‘seed’ for the
GDSArray object. It is not exported from the GDSArray package.
Seed objects should contain the GDS file path, node name, and are
expected to satisfy the seed
contract for implementing a DelayedArray backend,
i.e. to support dim() and dimnames().
## GDSArraySeed
## GDS File path: /github/workspace/pkglib/SeqArray/extdata/CEU_Exon.gds
## Array node: genotype/data
## Dim: 2 x 90 x 1348
The seed can be used to construct a GDSArray
instance.
## <2 x 90 x 1348> GDSArray object of type "integer":
## ,,1
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 3 3 0 3 . 0 0 0 0
## [2,] 3 3 0 3 . 0 0 0 0
##
## ,,2
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 3 3 0 3 . 0 0 0 0
## [2,] 3 3 0 3 . 0 0 0 0
##
## ...
##
## ,,1347
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 0 0 0 0 . 0 0 0 0
## [2,] 0 0 0 0 . 0 0 0 0
##
## ,,1348
## [,1] [,2] [,3] [,4] ... [,87] [,88] [,89] [,90]
## [1,] 3 3 0 3 . 3 3 3 3
## [2,] 3 3 1 3 . 3 3 3 3
The DelayedArray() constructor with
GDSArraySeed object as argument will return the same
content as the GDSArray() constructor over the same
GDSArraySeed.
## [1] "GDSArray"
## attr(,"package")
## [1] "GDSArray"
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GDSArray_1.30.0 DelayedArray_0.36.0 SparseArray_1.10.1
## [4] S4Arrays_1.10.0 abind_1.4-8 IRanges_2.44.0
## [7] S4Vectors_0.48.0 MatrixGenerics_1.22.0 matrixStats_1.5.0
## [10] Matrix_1.7-4 BiocGenerics_0.56.0 generics_0.1.4
## [13] gdsfmt_1.46.0 BiocStyle_2.38.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_2.0.0 crayon_1.5.3 compiler_4.5.2
## [4] BiocManager_1.30.26 Biostrings_2.78.0 GenomicRanges_1.62.0
## [7] parallel_4.5.2 jquerylib_0.1.4 Seqinfo_1.0.0
## [10] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-7
## [13] R6_2.6.1 XVector_0.50.0 knitr_1.50
## [16] maketools_1.3.2 bslib_0.9.0 rlang_1.1.6
## [19] cachem_1.1.0 xfun_0.54 sass_0.4.10
## [22] sys_3.4.3 cli_3.6.5 digest_0.6.37
## [25] grid_4.5.2 SeqArray_1.50.0 lifecycle_1.0.4
## [28] evaluate_1.0.5 buildtools_1.0.0 rmarkdown_2.30
## [31] tools_4.5.2 htmltools_0.5.8.1