glycoTraitR is designed to provide users a streamlined workflow for extracting, summarizing, and analyzing glycan structural traits from GPSM data generated by pGlyco3 or Glyco-Decipher. Building on the Bioconductor core data infrastructure by representing glycan trait data using the SummarizedExperiment class, enabling interoperability with downstream Bioconductor analysis tools, including packages such as limma and related differential analysis frameworks.
Rather than focusing on individual glycoforms, glycoTraitR converts glycan structures into biologically interpretable composition- and structure-based traits, and summarizes these traits at the glycopeptide (site) or protein level.
In this quick start guide, we demonstrate how to:
SummarizedExperiment)The glycoTraitR package can be installed using BiocManager.
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("matsui-lab/glycoTraitR")
After installation, we load the package into the current R session. All functions used in the following examples are provided by glycoTraitR.
library(glycoTraitR)
In this quick start tutorial, we walk through a typical glycoTraitR analysis workflow using example GPSM data generated by pGlyco3.
The goal is to demonstrate how glycan structures can be transformed into biologically interpretable traits, summarized at the protein/site level, and analyzed by statistical comparison between groups.
We begin by loading example GPSM data and accompanying sample metadata.
For reproducibility and convenience, this vignette uses a small toy dataset bundled with the package. The commented code above shows how users can instead download and analyze the full public dataset hosted on Zenodo.
# The following is an example file from Zenodo
# url <- "https://zenodo.org/records/17756830/files/pGlycoDB-GP-FDR-Pro-Quant-Site.txt?download=1"
# download.file(url, "pGlyco3_GPSM.txt", mode = "wb")
# gpsm <- read_pGlyco3_gpsm("pGlyco3_GPSM.txt")
# meta_url <- "https://zenodo.org/records/17759790/files/meta_mcp_2022_100433.rds?download=1"
# download.file(meta_url, "meta_example.rds", mode = "wb")
# meta <- readRDS("meta_example.rds")
# The following is a smaller toy example files from package
path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR")
gpsm_toyexample <- read_pGlyco3_gpsm(path)
data("meta_toyexample")
head(gpsm_toyexample)
## Protein Peptide GlycanStructure
## 1 CNTN1 GKAJSTGTLVITDPTR (N(F)(N(H(H(N))(H(N)))))
## 2 MOG JATGMEVGWYRPPFSR (N(F)(N(H(H(N))(H(N)))))
## 3 CNTN1 GKAJSTGTLVITDPTR (N(F)(N(H(H)(N)(H(N)))))
## 4 MOG JATGMEVGWYRPPFSR (N(F)(N(H(H)(N)(H(N)))))
## 5 CNTN1 GKAJSTGTLVITDPTR (N(F)(N(H(N)(H(N(H)))(H(N(F))))))
## 6 CNTN1 GKAJSTGTLVITDPTR (N(F)(N(H(N)(H(N))(H(N)))))
## File Count
## 1 KS_AD110_HILIC-elu_875_3594 1
## 2 KS_AD110_HILIC-elu_875_3594 3
## 3 KS_AD110_HILIC-elu_875_3594 1
## 4 KS_AD110_HILIC-elu_875_3594 2
## 5 KS_AD110_HILIC-elu_875_3594 1
## 6 KS_AD110_HILIC-elu_875_3594 3
head(meta_toyexample)
## # A tibble: 6 × 3
## Diagnosis `Sample number` file
## <fct> <dbl> <chr>
## 1 Asymptomatic 1 KS_AD1_HILIC-elu_870_3584
## 2 Normal 110 KS_AD110_HILIC-elu_875_3594
## 3 Normal 118 KS_AD118_HILIC-elu_851_3546
## 4 Normal 12 KS_AD12_HILIC-elu_867_3578
## 5 Normal 125 KS_AD125_HILIC-elu_845_3534
## 6 Normal 127 KS_AD127_HILIC-elu_873_3590
The gpsm_toyexample object contains glycopeptide-spectrum matches, including glycan structures encoded in pGlyco3 format.
The meta_toyexample object provides sample-level annotations that will later be used to define experimental groups.
Next, we convert GPSM-level glycan information into glycan trait matrices.
This step parses glycan structures, computes composition and structural traits, and summarizes the resulting values at the protein level using a SummarizedExperiment object.
trait_se <- build_trait_se(
gpsm_toyexample,
from = "pGlyco3",
motifs = NULL,
level = "protein",
meta = meta_toyexample
)
## adding traits to the gpsm matrix
## Warning: The `father` argument of `bfs()` is deprecated as of igraph 2.2.0.
## ℹ Please use the `parent` argument instead.
## ℹ The deprecated feature was likely used in the glycoTraitR package.
## Please report the issue at
## <https://github.com/matsui-lab/glycoTraitR/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## generating protein trait matrices
trait_se
## class: SummarizedExperiment
## dim: 6 1311
## metadata(0):
## assays(13): GlycanSize Hexose ... CoreFuc AntFuc
## rownames(6): CNTN1 MOG ... ITB8 PGCA
## rowData names(1): level
## colnames(1311): psm1 psm2 ... psm1310 psm1311
## colData names(4): Diagnosis Sample number file psm_id
The resulting trait_se object stores one assay matrix per glycan trait.
Rows correspond to proteins, columns correspond to GPSMs, and values represent trait-specific glycan abundances.
We now perform differential analysis to identify glycan traits that differ significantly between experimental groups.
As an example, we compare samples labeled as Normal and Symptomatic based on the Diagnosis column in the sample metadata.
changed_traits <- analyze_trait_changes(
trait_se = trait_se,
group_col = "Diagnosis",
group_levels = c("Normal", "Symptomatic"),
min_psm = 20
)
head(changed_traits)
## trait level l_pval f_val t_pval t_val
## t GlycanSize CNTN1 0.03309985 4.570347 0.03318245 2.136934
## t1 GlycanSize MOG 0.07219982 3.291038 0.02897848 2.210923
## t2 GlycanSize ITB8 0.01750819 5.994530 0.08711770 -1.793639
## t3 Hexose MOG 0.09384406 2.853005 0.01962574 2.365832
## t4 Hexose ITB8 0.01964245 5.770116 0.09411908 -1.749760
## t5 HexNAc CNTN1 0.04575081 4.014279 0.02068097 2.322456
The output table reports glycan traits and corresponding proteins for which either the Welch t-test or Levene’s variance test is significant (p < 0.05).
Each row represents a specific trait–protein combination showing evidence of differential behavior between the two groups.
We can visualize its distribution at the GPSM level.
Here, we select the Hexose trait for the protein MOG as an example.
p <- plot_trait_distribution(
trait_se = trait_se,
group_col = "Diagnosis",
group_levels = c("Normal", "Symptomatic"),
trait_name = "Hexose",
feature = "MOG"
)
p$p_hist
p$p_box
The histogram shows the distribution of GPSM-level trait abundances across groups, while the boxplot summarizes group-wise differences and variability.
In addition to pre-defined glycan traits, glycoTraitR allows users to define custom glycan motifs of biological interest. Next we demonstrate how to define the glycan motifs of users’ own interests. We show how glycoTraitR detects these motif occurrences within glycan structures.
We will:
library(igraph)
We first parse glycan structures into tree representations.
Internally, glycans are represented as graphs, where nodes correspond to monosaccharides and edges represent glycosidic linkages.
# Example: parsed glycan trees from existing GPSM data
path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR")
gpsm_toyexample <- read_pGlyco3_gpsm(path)
data("meta_toyexample")
glycans <- lapply(gpsm_toyexample$GlycanStructure, glycoTraitR:::pGlyco3_to_tree)
We use the function build_glycan_igraph() to visualize an example of glycan structure.
# visualize example structures
igraph_obj <- build_glycan_igraph(glycans[[1]])
plot_glycan_tree(igraph_obj)
This visualization highlights the branching structure and residue composition of a glycan, which is essential for understanding motif topology.
This example demonstrates how to define and detect a simple glycan motif consisting of three hexose (H) residues connected in series.
Here we define a linear trisaccharide motif:
"a-b", "b-c"#### Example 1 ####
# (1) Select a glycan from pGlyco3 data
tree_full <- glycans[[24]]
g_full <- build_glycan_igraph(tree_full)
# (2) Define motif manually
motif <- list(
node = c("H", "H", "H"),
edge = c("a-b", "b-c")
)
g_motif <- build_glycan_igraph(motif)
# (3) Perform subgraph matching
n_match <- count_subgraph_isomorphisms(
g_motif, g_full,
method = "vf2",
vertex.color1 = factor(V(g_full)$type),
vertex.color2 = factor(V(g_motif)$type)
)
# (4) Visualize
par(mfrow = c(1, 2))
plot_glycan_tree(g_motif)
plot_glycan_tree(g_full)
print(paste(n_match, "motif(s) found in the glycan"))
## [1] "0 motif(s) found in the glycan"
The reported match count indicates how many times the specified motif occurs within the full glycan structure.
More complex motifs can also be defined, including branched structures and specific monosaccharide types such as fucose (F).
This example defines a branched motif containing Fucose:
#### Example 2 ####
# (1) Select a glycan
tree_full <- glycans[[1]]
g_full <- glycoTraitR::build_glycan_igraph(tree_full)
# (2) Symbolic motif definition
motif <- list(
node = c("H", "N", "F"),
edge = c("a-b", "b-c")
)
g_motif <- glycoTraitR::build_glycan_igraph(motif)
# (3) Subgraph matching
n_match <- igraph::count_subgraph_isomorphisms(
g_motif, g_full,
method = "vf2",
vertex.color1 = factor(V(g_full)$type),
vertex.color2 = factor(V(g_motif)$type)
)
# (4) Visualize motif vs. full glycan
par(mfrow = c(1, 2))
plot_glycan_tree(g_motif)
plot_glycan_tree(g_full)
print(paste(n_match, "motif(s) found in the glycan"))
## [1] "0 motif(s) found in the glycan"
Multiple user-defined motifs can be supplied simultaneously.
Once defined, these motifs are treated as additional glycan traits and are incorporated into the trait computation workflow.
user_motifs <- list(
LinearH3 = list(
node = c("H", "H", "H"),
edge = c("a-b", "b-c")
),
FucBranch = list(
node = c("H", "N", "F"),
edge = c("a-b", "b-c")
)
)
Each user-defined motif appears as a separate assay in the SummarizedExperiment, allowing it to be analyzed together by build_trait_se() as other built-in glycan traits.
trait_se <- build_trait_se(
gpsm_toyexample,
from = "pGlyco3",
motifs = user_motifs,
level = "protein",
meta = meta_toyexample
)
## adding traits to the gpsm matrix
## generating protein trait matrices
SummarizedExperiment::assayNames(trait_se)
## [1] "GlycanSize" "Hexose" "HexNAc" "Neu5Ac" "Neu5Gc"
## [6] "Fucose" "Antennas" "Bisect" "Complex" "HighMan"
## [11] "Hybrid" "CoreFuc" "AntFuc" "LinearH3" "FucBranch"
Your motif counts will automatically appear as additional glycan traits in the trait matrices.
In this vignette, we provided a step-by-step tutorial demonstrating how glycoTraitR integrates glycoproteomics output into a unified trait-based analysis framework. You can now construct your own biologically meaningful glycan motifs and include them directly in glycoTraitR analyses.
## R Under development (unstable) (2026-01-15 r89304)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] igraph_2.2.1 glycoTraitR_0.99.2 BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.6 generics_0.1.4
## [3] SparseArray_1.11.10 lattice_0.22-9
## [5] digest_0.6.39 magrittr_2.0.4
## [7] evaluate_1.0.5 grid_4.6.0
## [9] RColorBrewer_1.1-3 fastmap_1.2.0
## [11] Matrix_1.7-4 Formula_1.2-5
## [13] BiocManager_1.30.27 scales_1.4.0
## [15] pbapply_1.7-4 abind_1.4-8
## [17] cli_3.6.5 rlang_1.1.7
## [19] XVector_0.51.0 Biobase_2.71.0
## [21] withr_3.0.2 DelayedArray_0.37.0
## [23] yaml_2.3.12 otel_0.2.0
## [25] S4Arrays_1.11.1 parallel_4.6.0
## [27] tools_4.6.0 dplyr_1.2.0
## [29] ggplot2_4.0.2 SummarizedExperiment_1.41.1
## [31] BiocGenerics_0.57.0 vctrs_0.7.1
## [33] R6_2.6.1 matrixStats_1.5.0
## [35] stats4_4.6.0 lifecycle_1.0.5
## [37] Seqinfo_1.1.0 S4Vectors_0.49.0
## [39] car_3.1-5 IRanges_2.45.0
## [41] pkgconfig_2.0.3 pillar_1.11.1
## [43] gtable_0.3.6 glue_1.8.0
## [45] xfun_0.56 tibble_3.3.1
## [47] GenomicRanges_1.63.1 tidyselect_1.2.1
## [49] MatrixGenerics_1.23.0 knitr_1.51
## [51] dichromat_2.0-0.1 farver_2.1.2
## [53] htmltools_0.5.9 labeling_0.4.3
## [55] rmarkdown_2.30 carData_3.0-6
## [57] compiler_4.6.0 S7_0.2.1