--- title: "Higher-Order Derivatives" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Higher-Order Derivatives} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ```{r setup} library(nabla) ``` ## The `D` operator The `D` operator composes to arbitrary order. `D(f, x)` gives the first derivative; `D(f, x, order = 2)` gives the second; `D(f, x, order = 3)` gives the third-order tensor; and so on: ```{r} f <- function(x) x[1]^3 D(f, 2) # f'(2) = 3*4 = 12 D(f, 2, order = 2) # f''(2) = 6*2 = 12 D(f, 2, order = 3) # f'''(2) = 6 ``` For multi-parameter functions, each application of `D` appends one dimension to the output. A scalar function $f: \mathbb{R}^n \to \mathbb{R}$ produces a gradient vector ($n$), then a Hessian matrix ($n \times n$), then a third-order tensor ($n \times n \times n$): ```{r} # Gradient of a 2-parameter function f <- function(x) x[1]^2 * x[2] D(f, c(3, 4)) # gradient: c(24, 9) # Hessian via D^2 D(f, c(3, 4), order = 2) # Composition works identically Df <- D(f) DDf <- D(Df) DDf(c(3, 4)) ``` For vector-valued functions, `D` produces the Jacobian: ```{r} g <- function(x) list(x[1] * x[2], x[1]^2 + x[2]) D(g, c(3, 4)) # 2x2 Jacobian: [[4, 3], [6, 1]] ``` The convenience functions `gradient()`, `hessian()`, and `jacobian()` are thin wrappers around `D`: ```{r} gradient(f, c(3, 4)) # == D(f, c(3, 4)) hessian(f, c(3, 4)) # == D(f, c(3, 4), order = 2) jacobian(g, c(3, 4)) # == D(g, c(3, 4)) ``` ## Curvature analysis The curvature of a curve $y = f(x)$ is: $$\kappa = \frac{|f''(x)|}{(1 + f'(x)^2)^{3/2}}$$ With `D()`, we can compute this directly: ```{r} curvature <- function(f, x) { d1 <- D(f, x) d2 <- D(f, x, order = 2) abs(d2) / (1 + d1^2)^(3/2) } # Curvature of sin(x) at various points # Wrap sin for D()'s x[1] convention sin_f <- function(x) sin(x[1]) xs <- seq(0, 2 * pi, length.out = 7) kappas <- sapply(xs, function(x) curvature(sin_f, x)) data.frame(x = round(xs, 3), curvature = round(kappas, 6)) ``` The table reveals the geometry of $\sin(x)$: curvature is zero at inflection points (integer multiples of $\pi$, where $\sin''(x) = 0$) and peaks at $\pi/2$ and $3\pi/2$ where $\sin$ reaches its extrema. The maximum curvature of 1.0 reflects the unit amplitude — for $A\sin(x)$ the maximum curvature would be $A$. At $x = \pi/2$, $\sin$ has maximum curvature because $|\sin''(\pi/2)| = 1$ and $\sin'(\pi/2) = 0$: ```{r} curvature(sin_f, pi / 2) # should be 1.0 ``` ### Visualizing curvature along sin(x) Curvature peaks where the second derivative is largest in magnitude and the first derivative is small. For $\sin(x)$, this occurs at $\pi/2$ and $3\pi/2$: ```{r fig-sin-curvature, fig.width=6, fig.height=4} xs_curve <- seq(0, 2 * pi, length.out = 200) sin_vals <- sin(xs_curve) kappa_vals <- sapply(xs_curve, function(x) curvature(sin_f, x)) oldpar <- par(mar = c(4, 4.5, 2, 4.5)) plot(xs_curve, sin_vals, type = "l", col = "steelblue", lwd = 2, xlab = "x", ylab = "sin(x)", main = "sin(x) and its curvature") par(new = TRUE) plot(xs_curve, kappa_vals, type = "l", col = "firebrick", lwd = 2, lty = 2, axes = FALSE, xlab = "", ylab = "") axis(4, col.axis = "firebrick") mtext(expression(kappa(x)), side = 4, line = 2.5, col = "firebrick") abline(v = c(pi / 2, 3 * pi / 2), col = "grey60", lty = 3) legend("bottom", legend = c("sin(x)", expression(kappa(x))), col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2, bty = "n", horiz = TRUE) par(oldpar) ``` The curvature reaches its maximum of 1.0 at $x = \pi/2$ (where $\sin'(x) = 0$ and $|\sin''(x)| = 1$) and returns to zero at integer multiples of $\pi$ (where $\sin''(x) = 0$). ## Taylor expansion A second-order Taylor approximation of $f$ around $x_0$: $$f(x) \approx f(x_0) + f'(x_0)(x - x_0) + \frac{1}{2} f''(x_0)(x - x_0)^2$$ `D()` computes this directly: ```{r} taylor2 <- function(f, x0, x) { f0 <- f(x0) f1 <- D(f, x0) f2 <- D(f, x0, order = 2) f0 + f1 * (x - x0) + 0.5 * f2 * (x - x0)^2 } # Approximate exp(x) near x = 0 f_exp <- function(x) exp(x[1]) xs <- c(-0.1, -0.01, 0, 0.01, 0.1) data.frame( x = xs, exact = exp(xs), taylor2 = sapply(xs, function(x) taylor2(f_exp, 0, x)), error = exp(xs) - sapply(xs, function(x) taylor2(f_exp, 0, x)) ) ``` Near the expansion point, the approximation is accurate. The error grows as $O(|x - x_0|^3)$ because the Taylor remainder is $\frac{f'''(\xi)}{6}(x - x_0)^3$. For $\exp(x)$ around $x_0 = 0$, $f'''(\xi) = \exp(\xi) \approx 1$, so the error at $x = 0.1$ is approximately $0.1^3/6 \approx 1.7 \times 10^{-4}$ and at $x = 0.01$ it drops to $\approx 1.7 \times 10^{-7}$ — a factor-of-1000 reduction for a factor-of-10 decrease in distance, confirming cubic convergence: ```{r fig-taylor-vs-exact, fig.width=6, fig.height=4} xs_plot <- seq(-2, 3, length.out = 300) exact_vals <- exp(xs_plot) taylor_vals <- sapply(xs_plot, function(x) taylor2(f_exp, 0, x)) oldpar <- par(mar = c(4, 4, 2, 1)) plot(xs_plot, exact_vals, type = "l", col = "steelblue", lwd = 2, xlab = "x", ylab = "y", main = expression("exp(x) vs 2nd-order Taylor at " * x[0] == 0), ylim = c(-1, 12)) lines(xs_plot, taylor_vals, col = "firebrick", lwd = 2, lty = 2) abline(v = 0, col = "grey60", lty = 3) points(0, 1, pch = 19, col = "grey40", cex = 1.2) legend("topleft", legend = c("exp(x)", expression("Taylor " * T[2](x))), col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2, bty = "n") par(oldpar) ``` The Taylor polynomial matches `exp(x)` closely near $x_0 = 0$ but diverges at larger distances, illustrating the local nature of polynomial approximation. ## Connection to MLE: exact Hessians for standard errors `hessian()` computes the exact second-derivative matrix at any point. For maximum likelihood estimation, this gives the **observed information matrix** $J(\hat\theta) = -H(\hat\theta)$, whose inverse is the asymptotic variance-covariance matrix. Exact Hessians yield exact standard errors and confidence intervals — the primary reason to use AD in statistical inference. Consider a Poisson log-likelihood for $\lambda$: ```{r} set.seed(123) data_pois <- rpois(50, lambda = 3) n <- length(data_pois) sum_x <- sum(data_pois) sum_lfact <- sum(lfactorial(data_pois)) ll_poisson <- function(theta) { lambda <- theta[1] sum_x * log(lambda) - n * lambda - sum_lfact } lambda0 <- 2.5 ``` **Primary approach**: Using `hessian()`: ```{r} hess_helper <- hessian(ll_poisson, lambda0) hess_helper ``` The observed information is simply `-hess_helper`, and the standard error is `1 / sqrt(-hess_helper)`: ```{r} se <- 1 / sqrt(-hess_helper[1, 1]) cat("SE(lambda) at lambda =", lambda0, ":", se, "\n") ``` **Verification**: Manual construction of a second-order dual number — the same arithmetic that `hessian()` performs internally: ```{r} # Build a dual_variable_n wrapped in a dual_vector manual_theta <- dual_vector(list(dual_variable_n(lambda0, 2))) result_manual <- ll_poisson(manual_theta) manual_hess <- deriv(deriv(result_manual)) manual_hess ``` Both approaches yield the same result: ```{r} hess_helper[1, 1] - manual_hess # ~0 ``` ## How it works: nested duals Under the hood, `D(f, x, order = 2)` uses **nested duals** — a dual number whose components are themselves dual numbers. The structure `dual(dual(x, 1), dual(1, 0))` simultaneously tracks: - $f(x)$ — the value of the inner value - $f'(x)$ — the derivative of the inner value - $f''(x)$ — the derivative of the outer derivative Nested duals can be constructed directly with `dual_variable_n()`: ```{r} x <- dual_variable_n(2, order = 2) # Evaluate x^3 result <- x^3 # Extract all three quantities deriv_n(result, 0) # f(2) = 8 deriv_n(result, 1) # f'(2) = 3*4 = 12 deriv_n(result, 2) # f''(2) = 6*2 = 12 ``` For constants in higher-order computations, use `dual_constant_n()`: ```{r} k <- dual_constant_n(5, order = 2) deriv_n(k, 0) # 5 deriv_n(k, 1) # 0 deriv_n(k, 2) # 0 ``` The `differentiate_n()` helper wraps the construction and extraction: ```{r} # Differentiate sin(x) at x = pi/4 result <- differentiate_n(sin, pi / 4, order = 2) result$value # sin(pi/4) result$d1 # cos(pi/4) = f' result$d2 # -sin(pi/4) = f'' ``` Verify against known values: ```{r} result$value - sin(pi / 4) # ~0 result$d1 - cos(pi / 4) # ~0 result$d2 - (-sin(pi / 4)) # ~0 ``` More complex functions work too: ```{r} f <- function(x) x * exp(-x^2) d2 <- differentiate_n(f, 1, order = 2) # Analytical: f'(x) = exp(-x^2)(1 - 2x^2) # f''(x) = exp(-x^2)(-6x + 4x^3) analytical_f1 <- exp(-1) * (1 - 2) analytical_f2 <- exp(-1) * (-6 + 4) d2$d1 - analytical_f1 # ~0 d2$d2 - analytical_f2 # ~0 ``` These low-level functions are useful for understanding how `nabla` works internally, but for most applications, `D()`, `gradient()`, and `hessian()` are the recommended interface. ## Summary The `D` operator provides exact higher-order derivatives through a single composable interface — no symbolic differentiation, no finite-difference grid, and no loss of precision: - **Exact curvature and Taylor coefficients.** `D(f, x, order = 2)` gives the exact second derivative needed for curvature analysis, Taylor expansion, and Newton-Raphson optimization. - **Exact standard errors for MLE.** The Hessian from `hessian()` directly yields the observed information matrix. Its inverse gives the asymptotic variance-covariance matrix — the foundation of confidence intervals and hypothesis tests in likelihood-based inference. - **Arbitrary order.** `D(f, x, order = 3)` yields third-order tensors for asymptotic skewness analysis (see `vignette("mle-skewness")`); higher orders work the same way. - **Nested duals under the hood.** Internally, `D` constructs nested dual numbers at each order level. The same arithmetic that propagates first derivatives recursively propagates second, third, and higher derivatives.