--- title: "Getting Started with BioMoR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with BioMoR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` BioMoR: Bioinformatics Modeling with Recursion and Autoencoder-Based Ensembles BioMoR is an R package for bioinformatics modeling that integrates: - Recursive Transformer architectures via Mixture-of-Recursions (MoR) (Bae et al. 2025 doi:10.48550/arXiv.2507.10524) - Autoencoder-based representation learning (Hinton & Salakhutdinov 2006 doi:10.1126/science.1127647) - Random Forests for robust tree-based modeling (Breiman 2001 doi:10.1023/A:1010933404324) - XGBoost for efficient gradient boosting (Chen & Guestrin 2016 doi:10.1145/2939672.2939785) - Stacked ensembles to combine diverse models for stronger predictive power. It is designed as a benchmarking framework for predictive workflows in bioinformatics, enabling consistent cross-validation, calibration, and threshold optimization. Motivation Modern bioinformatics involves high-dimensional and noisy data such as genomics, transcriptomics, and proteomics. BioMoR addresses these challenges by: - Using Mixture-of-Recursions (MoR) for adaptive recursive depth and computational efficiency. - Learning latent embeddings through autoencoders to improve classifier generalization. - Leveraging ensemble methods (RF, XGB) for robustness. - Providing a standardized benchmarking interface to evaluate models on ROC-AUC, PR-AUC, F1, Balanced Accuracy, Brier score, calibration, and threshold optimization. Example Workflow We illustrate with the classic iris dataset (binary recoding for simplicity): ```{r, message=FALSE} library(BioMoR) # Prepare dataset: recode labels to binary data(iris) iris$Label <- ifelse(iris$Species == "setosa", "Active", "Inactive") # Cross-validation control ctrl <- get_cv_control(cv = 3) # Train a Random Forest fit <- train_rf(iris, outcome_col = "Label", ctrl = ctrl) # Benchmark the model results <- biomor_benchmark(fit, iris, outcome_col = "Label") results ``` You can further extend this workflow by: - Replacing `train_rf()` with `train_xgb_caret()` for XGBoost. - Incorporating autoencoder features via `train_autoencoder()` and `get_embeddings()`. - Using `train_biomor()` to stack multiple models. - Benchmarking across models to compare pipelines in one consistent framework.