--- title: "Modeling with Traditional Ecological Knowledge (TEK): Bayesian Networks and Monte Carlo" author: "Cory Whitney" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{TEK-based decision models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Overview This vignette demonstrates how Traditional Ecological Knowledge (TEK) can be incorporated into decision models using Bayesian Networks (BNs) and Monte Carlo sampling. It is designed as a short tutorial showing practical, reproducible code snippets and guidance on eliciting and encoding TEK. Note: this vignette is both pedagogical and practical. It gives worked examples you can run locally. The goal is to show how TEK — whether collected as counts (k of n informants), multinomial categories, or qualitative rankings (high/med/low) — can be translated into probability distributions that feed Bayesian Networks. Monte Carlo sampling is then used to propagate TEK-derived uncertainty through the BN and produce decision-relevant outcome distributions. The examples below include a small simulated TEK dataset embedded in the vignette so you can run everything end-to-end. For heavier examples you may conditionally skip chunks if packages are not installed (see notes in chunks). By default the main code chunks here are enabled so the vignette will run in an environment with the required packages installed. ## Required packages We will show examples that use the following packages (install if you want to run the code): - bnlearn - gRain - dplyr - ggplot2 - purrr You can install them with `install.packages(c("bnlearn","gRain","dplyr","ggplot2","purrr"))` --- ## 1. Short motivation and data We often collect TEK as counts (k informants out of n) or qualitative categories (high/medium/low). A convenient approach is to convert counts into Beta (binary) or Dirichlet (categorical) priors and then sample from these priors inside a Monte Carlo loop to create Conditional Probability Tables (CPTs) for a Bayesian Network (BN). Below we illustrate the workflow with a tiny, didactic model: a management `Decision` (Harvest vs Conserve), a latent `Abundance` node (High/Low), and an `Impact` node (Recovery/Decline). The goal is to compare management options under TEK uncertainty. ## 2. Example: encode TEK as Beta priors and sample CPTs ```{r set-up, eval=TRUE} # Example: create Beta priors from TEK counts (k out of n informants say 'harvest causes decline') tek_k <- 7 # informants saying 'decline' tek_n <- 10 # total informants # Beta posterior (Laplace smoothing): Beta(k+1, n-k+1) alpha <- tek_k + 1 beta <- tek_n - tek_k + 1 # draw Monte Carlo samples for that probability set.seed(123) p_samples <- rbeta(1000, alpha, beta) # Inspect distribution library(ggplot2) qplot(p_samples, geom = "density") + ggtitle("TEK-derived prior for P(decline | harvest)") ``` ### Simulated informant dataset (multivariate) Below we create a small simulated TEK survey with 20 informants. Each informant provides: - a binary response about whether harvesting causes decline (1 = yes, 0 = no), - a categorical assessment of current abundance (High/Medium/Low), and - a self-reported confidence score (1-5). ```{r sim-data, eval=TRUE} set.seed(42) n_inf <- 20 # create a data frame informants <- data.frame( id = 1:n_inf, # binomial distribution with parameters n, size and prob says_decline = rbinom(n = n_inf, size = 1, prob = 0.6), # for abundance, take a sample of the specified size from the elements of x using either with or without replacement. abundance = sample(x= c("High","Medium","Low"), size = n_inf, replace=TRUE, prob = c(0.4,0.35,0.25)), # take a sample again for confidence confidence = sample(3:5, size = n_inf, replace=TRUE, prob=c(0.2,0.6,0.2)) ) knitr::kable(head(informants, 8)) ``` We will use this toy dataset to illustrate three aggregation approaches below: simple counts -> Beta/Dirichlet, weighted pooling by confidence, and mapping qualitative categories to numerical pseudo-counts. ## 3. Build a small BN and run Monte Carlo parameter sampling High-level algorithm 1. Define BN structure (nodes and conditional relationships). 2. For each Monte Carlo iteration: - Sample each uncertain probability from its Beta/Dirichlet prior. - Populate CPTs with the sampled probabilities. - Compile the BN and query target node probabilities or expected utilities. 3. Aggregate the distribution of outcomes across iterations and compare decisions. ```{r bn-mc-workflow, eval=requireNamespace("bnlearn", quietly = TRUE) && requireNamespace("gRain", quietly = TRUE)} if (requireNamespace("bnlearn", quietly = TRUE) && requireNamespace("gRain", quietly = TRUE)) { library(bnlearn); library(gRain); library(purrr) } else { message("Skipping BN + gRain workflow: 'bnlearn' and/or 'gRain' not available") } # define simple TEK priors (ensure alpha/beta are available during knitting) tek_k <- 7 # number of informants saying 'decline' tek_n <- 10 # total informants alpha <- tek_k + 1 beta <- tek_n - tek_k + 1 # define structure model_string <- "[Decision][Abundance|Decision][Impact|Abundance:Decision]" net <- model2network(model_string) # a single Monte Carlo iteration (pseudo-code) sample_one <- function(){ # sample p(Abundance=Low | Decision=Harvest) from TEK-derived Beta p_decline_if_harvest <- rbeta(1, alpha, beta) # create CPTs (values must be in the order expected by gRain) # Example CPTs (toy values) cpt_decision <- cptable(~Decision, values=c(0.5,0.5), levels=c("Harvest","Conserve")) # Abundance|Decision: P(High|Harvest)=1 - p_decline_if_harvest, P(High|Conserve)=0.9, etc. cpt_abundance <- cptable(~Abundance|Decision, values=c(1-p_decline_if_harvest, p_decline_if_harvest, 0.9, 0.1), levels=c("High","Low")) # Impact|Abundance: simple mapping cpt_impact <- cptable(~Impact|Abundance:Decision, values=c(0.95,0.05, 0.2,0.8), levels=c("Recovery","Decline")) plist <- compileCPT(list(cpt_decision, cpt_abundance, cpt_impact)) grain_net <- grain(plist) query <- querygrain(grain_net, nodes=c("Impact"), type="marginal") return(query) } # run Monte Carlo many times (here, do 500 iterations) results <- map_dfr(1:500, ~as.data.frame(sample_one())) # summarize distributions for decisions (extract expected probability of Recovery/Decline for each decision) ``` ## 4. Decision rules and visualization - Compute expected utility per decision for each Monte Carlo iteration, then compare mean, median and quantiles. - Use conservative decision rules if stakeholders prefer lower risk (e.g., maximize the 10th percentile of utility). Visualization ideas: - Density plots of expected utility for each option. - Fan chart showing quantiles across iterations. ## 5. Practical notes on elicitation and aggregation - For binary outcomes from TEK use Beta priors (k + 1, n - k + 1). - For multinomial TEK responses use Dirichlet with counts + 1 as pseudo-counts. - Weight informants by confidence or expertise when aggregating (weighted counts or hierarchical models). - Record provenance and do sensitivity by omitting individual informants. ### 5.1 Realistic aggregation examples We now show three practical aggregation methods using the simulated `informants` dataset above. 1) Simple pooling (counts -> Beta/Dirichlet) ```{r agg-simple, eval=TRUE} # Binary: counts for 'says_decline' k <- sum(informants$says_decline) n <- nrow(informants) alpha <- k + 1; beta <- n - k + 1 cat(sprintf("P(decline) prior ~ Beta(%d, %d) -> mean = %.2f\n", alpha, beta, alpha/(alpha+beta))) # Multinomial: counts for abundance categories -> Dirichlet ab_counts <- table(informants$abundance) ab_dirichlet_alpha <- as.numeric(ab_counts) + 1 cat("Abundance counts:\n") print(ab_counts) ``` 2) Weighted pooling by informant confidence ```{r agg-weighted, eval=TRUE} # Use confidence as weight (simple rescaling to pseudo-counts) weights <- informants$confidence / sum(informants$confidence) # Weighted count for binary outcome weighted_k <- sum(informants$says_decline * weights) * nrow(informants) # Convert to pseudo-counts (floor/round) if needed for Dirichlet/Beta wk <- round(weighted_k) walpha <- wk + 1; wbeta <- n - wk + 1 cat(sprintf("Weighted Beta prior approx: Beta(%d, %d) -> mean = %.2f\n", walpha, wbeta, walpha/(walpha+wbeta))) # Weighted multinomial: use weighted frequencies weighted_ab <- tapply(weights, informants$abundance, sum) weighted_ab_counts <- round(weighted_ab * nrow(informants)) cat("Weighted abundance pseudo-counts:\n") print(weighted_ab_counts) ``` 3) Map qualitative categories to pseudo-counts ```{r agg-qualitative, eval=TRUE} # Map High/Medium/Low to pseudo-counts reflecting stronger beliefs map_counts <- c(High=5, Medium=3, Low=1) qual_counts <- sapply(c("High","Medium","Low"), function(lvl) sum(map_counts[lvl] * (informants$abundance == lvl))) cat("Mapped pseudo-counts for abundance (qualitative -> counts):\n") print(qual_counts) # Convert to Dirichlet alpha qual_alpha <- qual_counts + 1 print(qual_alpha) ``` All three approaches give you numeric pseudo-counts that can be used as Beta or Dirichlet parameters. Which is appropriate depends on your elicitation protocol and whether you want to weight informants by confidence or expertise. --- # Appendix: TEK Elicitation Form and Simulated Dataset This appendix provides a practical TEK elicitation form template and a self-contained simulated dataset so you can reproduce all examples in this vignette without external data. ## A. TEK Elicitation Form Template Below is a template form that can be adapted for your specific research context. The form is designed to elicit three types of information: (1) binary management perception, (2) categorical abundance assessment, and (3) confidence rating. ### Informant Metadata ``` Informant ID: ___________ Date of interview: __________ Location / Community: ____________________ Years of experience with this resource: ___________ Primary use of resource: [ ] Subsistence [ ] Commercial [ ] Cultural [ ] Other: _______ Language of interview: [ ] English [ ] Other: ________________ Interviewer name: ____________________ ``` ### Question 1: Management Impact (Binary) **"In your experience, when people *harvest* [RESOURCE], does this cause the population to decline quickly?"** ``` [ ] Yes, harvesting causes the population to decline [ ] No, the population does not decline from harvesting [ ] Uncertain / Depends on context ``` ### Question 2: Current Resource Status (Categorical) **"How would you assess the current abundance of [RESOURCE] in this area compared to your childhood or earlier years?"** ``` [ ] High - abundance is good, similar to or better than before [ ] Medium - abundance has declined somewhat, but the resource is still available [ ] Low - abundance has declined significantly; the resource is now scarce or rare [ ] Not sure ``` ### Question 3: Management Recommendation (Categorical) **"What management approach do you think would be best for [RESOURCE]?"** ``` [ ] Harvest freely (no restrictions) [ ] Harvest with guidelines (seasonal or quantity limits) [ ] Conserve strictly (minimize or prohibit harvesting) [ ] Let it recover, then harvest sustainably [ ] Other: _______________________ ``` ### Question 4: Informant Confidence (Rating) **"How confident are you in your answers to the above questions?"** ``` Confidence score (1 = not confident, 5 = very confident): [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] ``` ### Question 5: Contextual Notes **"Are there any other factors, changes, or context we should know about?"** ``` ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ``` --- ## B. Complete Simulated Dataset for Reproducibility Below we generate a realistic simulated TEK dataset with 50 informants. You can use this dataset to run all the examples in the vignette and adapt them to your own data. ### Generate the simulated dataset ```{r generate-simulated-tek-data, eval=TRUE} # Set random seed for reproducibility set.seed(42) # Set number of informants n_informants <- 50 # Create a data frame with informant characteristics # Create informant IDs from 1 to n_informants tek_survey_data <- data.frame( informant_id = 1:n_informants, # Generate years of experience from normal distribution: mean=25, sd=15, minimum=5 # rnorm: draw from normal distribution with specified mean and standard deviation # pmax: ensure no values below 5 years_experience = pmax(5, round(rnorm(n_informants, mean = 25, sd = 15))), # Generate primary use category: subsistence, commercial, or cultural # sample: take a sample of the specified size from the elements without replacement primary_use = sample( x = c("Subsistence", "Commercial", "Cultural"), size = n_informants, replace = TRUE, prob = c(0.5, 0.2, 0.3) ), # Initialize columns for responses (will fill in loops below) says_harvest_declines = NA, abundance_status = NA, management_recommendation = NA, confidence = NA ) # Fill binary outcome: probability of saying "harvesting causes decline" increases with experience # Loop through each informant for (i in 1:n_informants) { # Calculate probability based on years of experience (ranges from ~0.3 to 0.7) p_decline <- 0.3 + (tek_survey_data$years_experience[i] / 100) * 0.4 # rbinom: draw from binomial distribution with size=1 (binary outcome) and probability p_decline tek_survey_data$says_harvest_declines[i] <- rbinom(1, size = 1, prob = p_decline) } # Fill categorical abundance assessment: correlated with perception of harvest decline # Create a matrix of probabilities for abundance (High, Medium, Low) given each response # rows: "says decline"=1, "does not say decline"=0; columns: High, Medium, Low probabilities abundance_probs <- matrix( c(0.3, 0.4, 0.3, # if says decline: P(High)=0.3, P(Medium)=0.4, P(Low)=0.3 0.5, 0.3, 0.2), # if does not say decline: P(High)=0.5, P(Medium)=0.3, P(Low)=0.2 nrow = 2, byrow = TRUE ) # Loop through each informant to assign abundance status for (i in 1:n_informants) { # Select probability row based on their answer about decline (add 1 to convert 0/1 to row 1/2) row_idx <- tek_survey_data$says_harvest_declines[i] + 1 # sample: select one abundance category with probabilities from the appropriate row tek_survey_data$abundance_status[i] <- sample( x = c("High", "Medium", "Low"), size = 1, prob = abundance_probs[row_idx, ] ) } # Fill management recommendation: correlated with perception of decline and current abundance # Loop through each informant for (i in 1:n_informants) { # If they say decline AND perceive low abundance: recommend conservation if (tek_survey_data$says_harvest_declines[i] == 1 && tek_survey_data$abundance_status[i] == "Low") { # sample: select one management option with specified probabilities tek_survey_data$management_recommendation[i] <- sample( c("Conserve strictly", "Harvest with guidelines", "Let it recover, then harvest"), size = 1, prob = c(0.6, 0.3, 0.1) ) # Else if they perceive high abundance: less restrictive recommendations } else if (tek_survey_data$abundance_status[i] == "High") { tek_survey_data$management_recommendation[i] <- sample( c("Harvest freely", "Harvest with guidelines", "Conserve strictly"), size = 1, prob = c(0.4, 0.4, 0.2) ) # Otherwise: balanced distribution of recommendations } else { tek_survey_data$management_recommendation[i] <- sample( c("Harvest freely", "Harvest with guidelines", "Conserve strictly", "Let it recover, then harvest"), size = 1, prob = c(0.2, 0.4, 0.25, 0.15) ) } } # Fill confidence scores: higher for more experienced informants # Loop through each informant for (i in 1:n_informants) { # Calculate probability of high confidence based on years of experience p_high_conf <- 0.3 + (tek_survey_data$years_experience[i] / 100) * 0.3 # Assign confidence based on this probability # runif: draw from uniform distribution (0 to 1) if (runif(1) < p_high_conf) { # High confidence: choose between 4 or 5 tek_survey_data$confidence[i] <- sample(4:5, 1, prob = c(0.4, 0.6)) } else { # Lower confidence: choose between 2 or 3 tek_survey_data$confidence[i] <- sample(2:3, 1, prob = c(0.4, 0.6)) } } # Display first 10 rows of the dataset cat("TEK Survey Dataset - First 10 rows:\n\n") print(knitr::kable(head(tek_survey_data, 10))) # Print summary statistics cat("\n\nSummary of TEK Survey Data (n=", n_informants, ")\n") cat("================================\n") # mean: calculate average; round to 1 decimal place # min/max: find minimum and maximum values cat("Years of experience: Mean =", round(mean(tek_survey_data$years_experience), 1), ", Range =", min(tek_survey_data$years_experience), "-", max(tek_survey_data$years_experience), "\n") # table: count frequency of each category cat("Primary use breakdown:\n") print(table(tek_survey_data$primary_use)) cat("\n\nBinary outcome (says harvesting declines):\n") print(table(tek_survey_data$says_harvest_declines)) cat("\n\nAbundance status:\n") print(table(tek_survey_data$abundance_status)) cat("\n\nManagement recommendations:\n") print(table(tek_survey_data$management_recommendation)) cat("\n\nConfidence scores:\n") print(table(tek_survey_data$confidence)) ``` ### Using the simulated dataset with the aggregation methods You can now apply the three aggregation approaches from Section 5.1 to this complete dataset: ```{r use-simulated-for-aggregation, eval=TRUE} # === Simple pooling: binary outcome === # sum: count the total number of informants who said harvesting causes decline (1 = yes, 0 = no) k <- sum(tek_survey_data$says_harvest_declines) # nrow: count the total number of rows (informants) n <- nrow(tek_survey_data) # Beta distribution parameters with Laplace smoothing (add 1 to each) alpha_simple <- k + 1 beta_simple <- n - k + 1 cat("Binary outcome aggregation (Simple pooling):\n") # sprintf: format and print the Beta distribution parameters cat(sprintf(" P(harvesting causes decline) ~ Beta(%d, %d)\n", alpha_simple, beta_simple)) # Calculate mean probability of the Beta distribution: alpha / (alpha + beta) cat(sprintf(" Mean probability: %.3f\n", alpha_simple / (alpha_simple + beta_simple))) # === Simple pooling: categorical outcome (abundance) === # table: count frequency of each abundance category abundance_counts <- table(tek_survey_data$abundance_status) # Convert counts to Dirichlet parameters by adding 1 to each count abundance_alpha <- as.numeric(abundance_counts) + 1 cat("\nAbundance status aggregation (Simple pooling):\n") cat(" Category counts:\n") print(abundance_counts) cat(" Dirichlet alpha (for Dirichlet prior):\n") print(abundance_alpha) # === Weighted pooling: by informant confidence === # Divide each confidence score by the sum to create normalized weights (sum to 1) weights <- tek_survey_data$confidence / sum(tek_survey_data$confidence) # Multiply weighted responses by informant count and sum to get weighted pseudo-count # sum: add up the products of (informant response) * (normalized weight) weighted_k_decline <- sum(tek_survey_data$says_harvest_declines * weights) * nrow(tek_survey_data) # round: convert to integer for use as Beta parameter wk_decline <- round(weighted_k_decline) # Calculate Beta parameters with Laplace smoothing walpha_decline <- wk_decline + 1 wbeta_decline <- n - wk_decline + 1 cat("\nBinary outcome aggregation (Confidence-weighted):\n") cat(sprintf(" P(harvesting causes decline) ~ Beta(%d, %d)\n", walpha_decline, wbeta_decline)) # Calculate mean probability of the weighted Beta distribution cat(sprintf(" Mean probability: %.3f\n", walpha_decline / (walpha_decline + wbeta_decline))) ``` --- ## C. Guidance for Adapting Form and Data - **Customize question text**: Replace `[RESOURCE]` with the actual resource you are studying (e.g., "yam," "mahogany," "mangrove"). - **Add or remove questions**: You may want to add questions about specific threats, seasonal patterns, or restoration practices. - **Language and terminology**: Translate and adapt categories (High/Medium/Low) to match local terminology and concepts. - **Informant selection**: Consider stratifying by age, gender, duration of residence, or primary use to detect variation in TEK. - **Sample size**: The simulated dataset uses 50 informants; adjust based on your resources and the diversity of your study population. - **Data quality checks**: Record provenance (who, when, where) and verify consistency (e.g., informants who say harvesting declines should more often report low abundance). --- ## D. Example: Loading Your Own Data If you collect data using a form similar to the template in Section A, you can load and process it similarly to the simulated dataset: ```{r example-load-own-data, eval=FALSE} # Load your survey data (replace 'your_data.csv' with your file) # read.csv: read a comma-separated values file and create a data frame my_tek_data <- read.csv("your_data.csv") # Check structure of your data # str: display internal structure of an R object, showing column names, types, and sample values str(my_tek_data) # Summarize binary outcome cat("Binary outcome:\n") # table: create a frequency table of the responses print(table(my_tek_data$says_harvest_declines)) # Aggregate binary outcome using the methods from Section 5.1 # sum: count the number of "yes" responses (1 = yes) k <- sum(my_tek_data$says_harvest_declines) # nrow: count the total number of rows (informants) n <- nrow(my_tek_data) # Calculate Beta distribution parameters with Laplace smoothing alpha <- k + 1; beta <- n - k + 1 # sprintf: format the prior as a string and print it cat(sprintf("Prior: Beta(%d, %d)\n", alpha, beta)) ```