---
title: "Modeling with Traditional Ecological Knowledge (TEK): Bayesian Networks and Monte Carlo"
author: "Cory Whitney"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{TEK-based decision models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Overview

This vignette demonstrates how Traditional Ecological Knowledge (TEK) can be incorporated into decision models using Bayesian Networks (BNs) and Monte Carlo sampling. It is designed as a short tutorial showing practical, reproducible code snippets and guidance on eliciting and encoding TEK.

Note: this vignette is both pedagogical and practical. It gives worked examples you can run locally. The goal is to show how TEK — whether collected as counts (k of n informants), multinomial categories, or qualitative rankings (high/med/low) — can be translated into probability distributions that feed Bayesian Networks. Monte Carlo sampling is then used to propagate TEK-derived uncertainty through the BN and produce decision-relevant outcome distributions.

The examples below include a small simulated TEK dataset embedded in the vignette so you can run everything end-to-end. For heavier examples you may conditionally skip chunks if packages are not installed (see notes in chunks). By default the main code chunks here are enabled so the vignette will run in an environment with the required packages installed.

## Required packages

We will show examples that use the following packages (install if you want to run the code):

- bnlearn
- gRain
- dplyr
- ggplot2
- purrr

You can install them with `install.packages(c("bnlearn","gRain","dplyr","ggplot2","purrr"))`

---

## 1. Short motivation and data

We often collect TEK as counts (k informants out of n) or qualitative categories (high/medium/low). A convenient approach is to convert counts into Beta (binary) or Dirichlet (categorical) priors and then sample from these priors inside a Monte Carlo loop to create Conditional Probability Tables (CPTs) for a Bayesian Network (BN).

Below we illustrate the workflow with a tiny, didactic model: a management `Decision` (Harvest vs Conserve), a latent `Abundance` node (High/Low), and an `Impact` node (Recovery/Decline). The goal is to compare management options under TEK uncertainty.

## 2. Example: encode TEK as Beta priors and sample CPTs 

```{r set-up, eval=TRUE}
# Example: create Beta priors from TEK counts (k out of n informants say 'harvest causes decline')
tek_k <- 7   # informants saying 'decline'
tek_n <- 10  # total informants
# Beta posterior (Laplace smoothing): Beta(k+1, n-k+1)
alpha <- tek_k + 1
beta  <- tek_n - tek_k + 1

# draw Monte Carlo samples for that probability
set.seed(123)
p_samples <- rbeta(1000, alpha, beta)

# Inspect distribution
library(ggplot2)
qplot(p_samples, geom = "density") + ggtitle("TEK-derived prior for P(decline | harvest)")
```

### Simulated informant dataset (multivariate)

Below we create a small simulated TEK survey with 20 informants. Each informant provides:
- a binary response about whether harvesting causes decline (1 = yes, 0 = no),
- a categorical assessment of current abundance (High/Medium/Low), and
- a self-reported confidence score (1-5).

```{r sim-data, eval=TRUE}
set.seed(42)
n_inf <- 20
# create a data frame
informants <- data.frame(
  id = 1:n_inf,
  # binomial distribution with parameters n, size and prob
  says_decline = rbinom(n = n_inf, size = 1, prob = 0.6),
  # for abundance, take a sample of the specified size from the elements of x using either with or without replacement.
  abundance = sample(x= c("High","Medium","Low"), 
                     size = n_inf, 
                     replace=TRUE, 
                     prob = c(0.4,0.35,0.25)),
  # take a sample again for confidence
  confidence = sample(3:5, 
                      size = n_inf, 
                      replace=TRUE,
                      prob=c(0.2,0.6,0.2))
)

knitr::kable(head(informants, 8))
```

We will use this toy dataset to illustrate three aggregation approaches below: simple counts -> Beta/Dirichlet, weighted pooling by confidence, and mapping qualitative categories to numerical pseudo-counts.

## 3. Build a small BN and run Monte Carlo parameter sampling

High-level algorithm

1. Define BN structure (nodes and conditional relationships).
2. For each Monte Carlo iteration:
   - Sample each uncertain probability from its Beta/Dirichlet prior.
   - Populate CPTs with the sampled probabilities.
   - Compile the BN and query target node probabilities or expected utilities.
3. Aggregate the distribution of outcomes across iterations and compare decisions.

```{r bn-mc-workflow, eval=requireNamespace("bnlearn", quietly = TRUE) && requireNamespace("gRain", quietly = TRUE)}
if (requireNamespace("bnlearn", quietly = TRUE) && requireNamespace("gRain", quietly = TRUE)) {
  library(bnlearn); library(gRain); library(purrr)
} else {
  message("Skipping BN + gRain workflow: 'bnlearn' and/or 'gRain' not available")
}

# define simple TEK priors (ensure alpha/beta are available during knitting)
tek_k <- 7   # number of informants saying 'decline'
tek_n <- 10  # total informants
alpha <- tek_k + 1
beta  <- tek_n - tek_k + 1

# define structure
model_string <- "[Decision][Abundance|Decision][Impact|Abundance:Decision]"
net <- model2network(model_string)

# a single Monte Carlo iteration (pseudo-code)
sample_one <- function(){
  # sample p(Abundance=Low | Decision=Harvest) from TEK-derived Beta
  p_decline_if_harvest <- rbeta(1, alpha, beta)

  # create CPTs (values must be in the order expected by gRain)
  # Example CPTs (toy values)
  cpt_decision <- cptable(~Decision, values=c(0.5,0.5), levels=c("Harvest","Conserve"))
  # Abundance|Decision: P(High|Harvest)=1 - p_decline_if_harvest, P(High|Conserve)=0.9, etc.
  cpt_abundance <- cptable(~Abundance|Decision, values=c(1-p_decline_if_harvest, p_decline_if_harvest, 0.9, 0.1), levels=c("High","Low"))
  # Impact|Abundance: simple mapping
  cpt_impact <- cptable(~Impact|Abundance:Decision, values=c(0.95,0.05, 0.2,0.8), levels=c("Recovery","Decline"))

  plist <- compileCPT(list(cpt_decision, cpt_abundance, cpt_impact))
  grain_net <- grain(plist)
  query <- querygrain(grain_net, nodes=c("Impact"), type="marginal")
  return(query)
}

# run Monte Carlo many times (here, do 500 iterations)
results <- map_dfr(1:500, ~as.data.frame(sample_one()))

# summarize distributions for decisions (extract expected probability of Recovery/Decline for each decision)


```

## 4. Decision rules and visualization

- Compute expected utility per decision for each Monte Carlo iteration, then compare mean, median and quantiles.
- Use conservative decision rules if stakeholders prefer lower risk (e.g., maximize the 10th percentile of utility).

Visualization ideas:

- Density plots of expected utility for each option.
- Fan chart showing quantiles across iterations.

<!-- ```{r plot-idea, eval=TRUE} -->
<!-- # Suppose we computed expected utility per decision across iterations into a data.frame `mc_util` -->
<!-- ggplot(mc_util, aes(x=utility, fill=decision)) + geom_density(alpha=0.4) -->
<!-- ``` -->

## 5. Practical notes on elicitation and aggregation

- For binary outcomes from TEK use Beta priors (k + 1, n - k + 1).
- For multinomial TEK responses use Dirichlet with counts + 1 as pseudo-counts.
- Weight informants by confidence or expertise when aggregating (weighted counts or hierarchical models).
- Record provenance and do sensitivity by omitting individual informants.

### 5.1 Realistic aggregation examples

We now show three practical aggregation methods using the simulated `informants` dataset above.

1) Simple pooling (counts -> Beta/Dirichlet)

```{r agg-simple, eval=TRUE}
# Binary: counts for 'says_decline'
k <- sum(informants$says_decline)
n <- nrow(informants)
alpha <- k + 1; beta <- n - k + 1
cat(sprintf("P(decline) prior ~ Beta(%d, %d) -> mean = %.2f\n", alpha, beta, alpha/(alpha+beta)))

# Multinomial: counts for abundance categories -> Dirichlet
ab_counts <- table(informants$abundance)
ab_dirichlet_alpha <- as.numeric(ab_counts) + 1
cat("Abundance counts:\n")
print(ab_counts)
```

2) Weighted pooling by informant confidence

```{r agg-weighted, eval=TRUE}
# Use confidence as weight (simple rescaling to pseudo-counts)
weights <- informants$confidence / sum(informants$confidence)
# Weighted count for binary outcome
weighted_k <- sum(informants$says_decline * weights) * nrow(informants)
# Convert to pseudo-counts (floor/round) if needed for Dirichlet/Beta
wk <- round(weighted_k)
walpha <- wk + 1; wbeta <- n - wk + 1
cat(sprintf("Weighted Beta prior approx: Beta(%d, %d) -> mean = %.2f\n", walpha, wbeta, walpha/(walpha+wbeta)))

# Weighted multinomial: use weighted frequencies
weighted_ab <- tapply(weights, informants$abundance, sum)
weighted_ab_counts <- round(weighted_ab * nrow(informants))
cat("Weighted abundance pseudo-counts:\n")
print(weighted_ab_counts)
```

3) Map qualitative categories to pseudo-counts

```{r agg-qualitative, eval=TRUE}
# Map High/Medium/Low to pseudo-counts reflecting stronger beliefs
map_counts <- c(High=5, Medium=3, Low=1)
qual_counts <- sapply(c("High","Medium","Low"), function(lvl) sum(map_counts[lvl] * (informants$abundance == lvl)))
cat("Mapped pseudo-counts for abundance (qualitative -> counts):\n")
print(qual_counts)
# Convert to Dirichlet alpha
qual_alpha <- qual_counts + 1
print(qual_alpha)
```

All three approaches give you numeric pseudo-counts that can be used as Beta or Dirichlet parameters. Which is appropriate depends on your elicitation protocol and whether you want to weight informants by confidence or expertise.

---

# Appendix: TEK Elicitation Form and Simulated Dataset

This appendix provides a practical TEK elicitation form template and a self-contained simulated dataset so you can reproduce all examples in this vignette without external data.

## A. TEK Elicitation Form Template

Below is a template form that can be adapted for your specific research context. The form is designed to elicit three types of information: (1) binary management perception, (2) categorical abundance assessment, and (3) confidence rating.

### Informant Metadata

```
Informant ID: ___________
Date of interview: __________
Location / Community: ____________________
Years of experience with this resource: ___________
Primary use of resource: [ ] Subsistence  [ ] Commercial  [ ] Cultural  [ ] Other: _______
Language of interview: [ ] English  [ ] Other: ________________
Interviewer name: ____________________
```

### Question 1: Management Impact (Binary)

**"In your experience, when people *harvest* [RESOURCE], does this cause the population to decline quickly?"**

```
[ ] Yes, harvesting causes the population to decline
[ ] No, the population does not decline from harvesting
[ ] Uncertain / Depends on context
```

### Question 2: Current Resource Status (Categorical)

**"How would you assess the current abundance of [RESOURCE] in this area compared to your childhood or earlier years?"**

```
[ ] High - abundance is good, similar to or better than before
[ ] Medium - abundance has declined somewhat, but the resource is still available
[ ] Low - abundance has declined significantly; the resource is now scarce or rare
[ ] Not sure
```

### Question 3: Management Recommendation (Categorical)

**"What management approach do you think would be best for [RESOURCE]?"**

```
[ ] Harvest freely (no restrictions)
[ ] Harvest with guidelines (seasonal or quantity limits)
[ ] Conserve strictly (minimize or prohibit harvesting)
[ ] Let it recover, then harvest sustainably
[ ] Other: _______________________
```

### Question 4: Informant Confidence (Rating)

**"How confident are you in your answers to the above questions?"**

```
Confidence score (1 = not confident, 5 = very confident): [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
```

### Question 5: Contextual Notes

**"Are there any other factors, changes, or context we should know about?"**

```
___________________________________________________________
___________________________________________________________
___________________________________________________________
```

---

## B. Complete Simulated Dataset for Reproducibility

Below we generate a realistic simulated TEK dataset with 50 informants. You can use this dataset to run all the examples in the vignette and adapt them to your own data.

### Generate the simulated dataset

```{r generate-simulated-tek-data, eval=TRUE}
# Set random seed for reproducibility
set.seed(42)
# Set number of informants
n_informants <- 50

# Create a data frame with informant characteristics
# Create informant IDs from 1 to n_informants
tek_survey_data <- data.frame(
  informant_id = 1:n_informants,
  # Generate years of experience from normal distribution: mean=25, sd=15, minimum=5
  # rnorm: draw from normal distribution with specified mean and standard deviation
  # pmax: ensure no values below 5
  years_experience = pmax(5, round(rnorm(n_informants, mean = 25, sd = 15))),
  # Generate primary use category: subsistence, commercial, or cultural
  # sample: take a sample of the specified size from the elements without replacement
  primary_use = sample(
    x = c("Subsistence", "Commercial", "Cultural"),
    size = n_informants,
    replace = TRUE,
    prob = c(0.5, 0.2, 0.3)
  ),
  # Initialize columns for responses (will fill in loops below)
  says_harvest_declines = NA,
  abundance_status = NA,
  management_recommendation = NA,
  confidence = NA
)

# Fill binary outcome: probability of saying "harvesting causes decline" increases with experience
# Loop through each informant
for (i in 1:n_informants) {
  # Calculate probability based on years of experience (ranges from ~0.3 to 0.7)
  p_decline <- 0.3 + (tek_survey_data$years_experience[i] / 100) * 0.4
  # rbinom: draw from binomial distribution with size=1 (binary outcome) and probability p_decline
  tek_survey_data$says_harvest_declines[i] <- rbinom(1, size = 1, prob = p_decline)
}

# Fill categorical abundance assessment: correlated with perception of harvest decline
# Create a matrix of probabilities for abundance (High, Medium, Low) given each response
# rows: "says decline"=1, "does not say decline"=0; columns: High, Medium, Low probabilities
abundance_probs <- matrix(
  c(0.3, 0.4, 0.3,  # if says decline: P(High)=0.3, P(Medium)=0.4, P(Low)=0.3
    0.5, 0.3, 0.2), # if does not say decline: P(High)=0.5, P(Medium)=0.3, P(Low)=0.2
  nrow = 2,
  byrow = TRUE
)

# Loop through each informant to assign abundance status
for (i in 1:n_informants) {
  # Select probability row based on their answer about decline (add 1 to convert 0/1 to row 1/2)
  row_idx <- tek_survey_data$says_harvest_declines[i] + 1
  # sample: select one abundance category with probabilities from the appropriate row
  tek_survey_data$abundance_status[i] <- sample(
    x = c("High", "Medium", "Low"),
    size = 1,
    prob = abundance_probs[row_idx, ]
  )
}

# Fill management recommendation: correlated with perception of decline and current abundance
# Loop through each informant
for (i in 1:n_informants) {
  # If they say decline AND perceive low abundance: recommend conservation
  if (tek_survey_data$says_harvest_declines[i] == 1 && 
      tek_survey_data$abundance_status[i] == "Low") {
    # sample: select one management option with specified probabilities
    tek_survey_data$management_recommendation[i] <- sample(
      c("Conserve strictly", "Harvest with guidelines", "Let it recover, then harvest"),
      size = 1,
      prob = c(0.6, 0.3, 0.1)
    )
  # Else if they perceive high abundance: less restrictive recommendations
  } else if (tek_survey_data$abundance_status[i] == "High") {
    tek_survey_data$management_recommendation[i] <- sample(
      c("Harvest freely", "Harvest with guidelines", "Conserve strictly"),
      size = 1,
      prob = c(0.4, 0.4, 0.2)
    )
  # Otherwise: balanced distribution of recommendations
  } else {
    tek_survey_data$management_recommendation[i] <- sample(
      c("Harvest freely", "Harvest with guidelines", "Conserve strictly", "Let it recover, then harvest"),
      size = 1,
      prob = c(0.2, 0.4, 0.25, 0.15)
    )
  }
}

# Fill confidence scores: higher for more experienced informants
# Loop through each informant
for (i in 1:n_informants) {
  # Calculate probability of high confidence based on years of experience
  p_high_conf <- 0.3 + (tek_survey_data$years_experience[i] / 100) * 0.3
  # Assign confidence based on this probability
  # runif: draw from uniform distribution (0 to 1)
  if (runif(1) < p_high_conf) {
    # High confidence: choose between 4 or 5
    tek_survey_data$confidence[i] <- sample(4:5, 1, prob = c(0.4, 0.6))
  } else {
    # Lower confidence: choose between 2 or 3
    tek_survey_data$confidence[i] <- sample(2:3, 1, prob = c(0.4, 0.6))
  }
}

# Display first 10 rows of the dataset
cat("TEK Survey Dataset - First 10 rows:\n\n")
print(knitr::kable(head(tek_survey_data, 10)))

# Print summary statistics
cat("\n\nSummary of TEK Survey Data (n=", n_informants, ")\n")
cat("================================\n")
# mean: calculate average; round to 1 decimal place
# min/max: find minimum and maximum values
cat("Years of experience: Mean =", round(mean(tek_survey_data$years_experience), 1), 
    ", Range =", min(tek_survey_data$years_experience), "-", 
    max(tek_survey_data$years_experience), "\n")
# table: count frequency of each category
cat("Primary use breakdown:\n")
print(table(tek_survey_data$primary_use))
cat("\n\nBinary outcome (says harvesting declines):\n")
print(table(tek_survey_data$says_harvest_declines))
cat("\n\nAbundance status:\n")
print(table(tek_survey_data$abundance_status))
cat("\n\nManagement recommendations:\n")
print(table(tek_survey_data$management_recommendation))
cat("\n\nConfidence scores:\n")
print(table(tek_survey_data$confidence))
```

### Using the simulated dataset with the aggregation methods

You can now apply the three aggregation approaches from Section 5.1 to this complete dataset:

```{r use-simulated-for-aggregation, eval=TRUE}
# === Simple pooling: binary outcome ===
# sum: count the total number of informants who said harvesting causes decline (1 = yes, 0 = no)
k <- sum(tek_survey_data$says_harvest_declines)
# nrow: count the total number of rows (informants)
n <- nrow(tek_survey_data)
# Beta distribution parameters with Laplace smoothing (add 1 to each)
alpha_simple <- k + 1
beta_simple <- n - k + 1
cat("Binary outcome aggregation (Simple pooling):\n")
# sprintf: format and print the Beta distribution parameters
cat(sprintf("  P(harvesting causes decline) ~ Beta(%d, %d)\n", alpha_simple, beta_simple))
# Calculate mean probability of the Beta distribution: alpha / (alpha + beta)
cat(sprintf("  Mean probability: %.3f\n", alpha_simple / (alpha_simple + beta_simple)))

# === Simple pooling: categorical outcome (abundance) ===
# table: count frequency of each abundance category
abundance_counts <- table(tek_survey_data$abundance_status)
# Convert counts to Dirichlet parameters by adding 1 to each count
abundance_alpha <- as.numeric(abundance_counts) + 1
cat("\nAbundance status aggregation (Simple pooling):\n")
cat("  Category counts:\n")
print(abundance_counts)
cat("  Dirichlet alpha (for Dirichlet prior):\n")
print(abundance_alpha)

# === Weighted pooling: by informant confidence ===
# Divide each confidence score by the sum to create normalized weights (sum to 1)
weights <- tek_survey_data$confidence / sum(tek_survey_data$confidence)
# Multiply weighted responses by informant count and sum to get weighted pseudo-count
# sum: add up the products of (informant response) * (normalized weight)
weighted_k_decline <- sum(tek_survey_data$says_harvest_declines * weights) * nrow(tek_survey_data)
# round: convert to integer for use as Beta parameter
wk_decline <- round(weighted_k_decline)
# Calculate Beta parameters with Laplace smoothing
walpha_decline <- wk_decline + 1
wbeta_decline <- n - wk_decline + 1
cat("\nBinary outcome aggregation (Confidence-weighted):\n")
cat(sprintf("  P(harvesting causes decline) ~ Beta(%d, %d)\n", walpha_decline, wbeta_decline))
# Calculate mean probability of the weighted Beta distribution
cat(sprintf("  Mean probability: %.3f\n", walpha_decline / (walpha_decline + wbeta_decline)))
```

---

## C. Guidance for Adapting Form and Data

- **Customize question text**: Replace `[RESOURCE]` with the actual resource you are studying (e.g., "yam," "mahogany," "mangrove").
- **Add or remove questions**: You may want to add questions about specific threats, seasonal patterns, or restoration practices.
- **Language and terminology**: Translate and adapt categories (High/Medium/Low) to match local terminology and concepts.
- **Informant selection**: Consider stratifying by age, gender, duration of residence, or primary use to detect variation in TEK.
- **Sample size**: The simulated dataset uses 50 informants; adjust based on your resources and the diversity of your study population.
- **Data quality checks**: Record provenance (who, when, where) and verify consistency (e.g., informants who say harvesting declines should more often report low abundance).

---

## D. Example: Loading Your Own Data

If you collect data using a form similar to the template in Section A, you can load and process it similarly to the simulated dataset:

```{r example-load-own-data, eval=FALSE}
# Load your survey data (replace 'your_data.csv' with your file)
# read.csv: read a comma-separated values file and create a data frame
my_tek_data <- read.csv("your_data.csv")

# Check structure of your data
# str: display internal structure of an R object, showing column names, types, and sample values
str(my_tek_data)

# Summarize binary outcome
cat("Binary outcome:\n")
# table: create a frequency table of the responses
print(table(my_tek_data$says_harvest_declines))

# Aggregate binary outcome using the methods from Section 5.1
# sum: count the number of "yes" responses (1 = yes)
k <- sum(my_tek_data$says_harvest_declines)
# nrow: count the total number of rows (informants)
n <- nrow(my_tek_data)
# Calculate Beta distribution parameters with Laplace smoothing
alpha <- k + 1; beta <- n - k + 1
# sprintf: format the prior as a string and print it
cat(sprintf("Prior: Beta(%d, %d)\n", alpha, beta))
```