CoSMoS was conceived back in 2009 (see note in section 5) and was officially released on CRAN in April 2019. Since then CoSMoS has become one of the leading and most widely downloaded R packages for stochastic simulation of non-Gaussian time series. Designed with the end user in mind, CoSMoS makes univariate, multivariate, or random field simulations easy. You can reliably generate time series or fields of rainfall, streamflow, wind speed, relative humidity, or any other environmental variable in seconds. Just choose the probability distribution and correlation properties of the time series you want to generate, and it will do the rest.


~Unified Theory of Stochastic Modelling | Papalexiou (2018)

“Hydroclimatic processes come in all “shapes and sizes”. They are characterized by different spatiotemporal correlation structures and probability distributions that can be continuous, mixed-type, discrete or even binary. Simulating such processes by reproducing precisely their marginal distribution and linear correlation structure, including features like intermittency, can greatly improve hydrological analysis and design. … Here, a single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent”” Gaussian process.”

~Precise Temporal Disaggregation | Papalexiou et al. (2018)

“Hydroclimatic variables such as precipitation and temperature are often measured or simulated by climate models at coarser spatiotemporal scales than those needed for operational purposes. … Here we introduce a novel disaggregation method, named Disaggregation Preserving Marginals and Correlations (DiPMaC), that is able to disaggregate a coarse-scale time series to any finer scale, while reproducing the probability distribution and the linear correlation structure of the fine-scale process. DiPMaC is also generalized for arbitrary nonstationary scenarios to reproduce time varying marginals.”

~Random Fields Simplified | Papalexiou and Serinaldi (2020)

“Nature manifests itself in space and time. The spatiotemporal complexity of processes such as precipitation, temperature, and wind, does not allow purely deterministic modeling. Spatiotemporal random fields have a long history in modeling such processes, and yet a single unified framework offering the flexibility to simulate processes that may differ profoundly does not exist. Here we introduce a blueprint to efficiently simulate spatiotemporal random fields that preserve any marginal distribution, any valid spatiotemporal correlation structure, and intermittency.”

~Advancing Space-Time Simulation | Papalexiou, Serinaldi and Porcu (2021)

“…we advance random field simulation by introducing the concepts of general velocity fields and general anisotropy transformations. To illustrate the potential of CoSMoS, we simulate random fields with complex patterns and motion mimicking rainfall storms moving across an area, spiraling fields resembling weather cyclones, fields converging to (or diverging from) a point, and colliding air masses.”


1 Step by step guide to timeseries simulation

In most cases we can accurately simulate a process by reproducing its marginal distribution and its autocorrelation structure. The first is typically represented by a parametric probability distribution; the second either by a parametric correlation structure, or by specific correlation coefficient values. CoSMoS is the first software that enables the generation of time series while preserving any marginal distribution, correlation structure, and intermittency. Intermittency can be defined by the probability of zero values, and is essential in the simulation of precipitation at fine scales (e.g., daily, hourly) but also for any other intermittent process, such as flows of ephemeral streams, wind speed, etc.

Accurate generation of time series requires knowledge of the statistical properties of the process under study. Processes such as precipitation, streamflow, temperature, wind, or evaporation may have very different properties; that is, they can be described by different probability distributions, correlation structures, and may or may not be intermittent.

There are four main steps required to generate a time series: 1. Choosing a probability distribution 2. Choosing a correlation structure, 3. Adding intermittency, and 4. Validating the results.

The four steps are explained below.

1.1 Choosing a Probability Distribution

The choice of a probability distribution to describe the variable you wish to simulate is crucial. The probability distribution expresses how frequently specific magnitudes of the variable occur in the data set. Air temperature is typically described by bell-shaped distributions like the Normal. Relative humidity needs distributions defined in the interval (0,1) such as the Beta or the Kumaraswamy. Streamflow typically has heavy tails and power-type distributions might be needed to reproduce the behavior of the extremes. Wind speed typically can be modelled by positively skewed distributions such as the Weibull. Precipitation at fine scales has to be modelled by mixed-type (or else zero inflated) distributions, that is, having a probability value to describe the occurrence of zeros, and a continuous distribution, such the Generalized Gamma or the Burr type XII, to describe the frequency of nonzero values.

CoSMoS supports all the standard distribution functions of R and from other packages (tens of distributions) but also comes with some built-in distributions such as Generalized Gamma, the Pareto type II, and the Burr type III and XII distributions. More details on the CoSMoS distributions and their parameters can be found in Section 4.

Distributions may have differing numbers and types of parameters (e.g. location, scale, and shape parameters). For example, the Generalized Gamma distribution has one scale and two shape parameters.

CoSMoS has built-in functions to fit distributions to data and assess their goodness-of-fit based on graphs; but you can also use any other package for distribution fitting.

1.2 Choosing a Correlation Structure

The correlation structure expresses how much the random variables depend upon each other. Processes at fine temporal scales are typically more correlated than those at coarse scales. CoSMoS offers two options to introduce correlations: (1) by defining specific lag autocorrelations as a list starting from lag 0 up to a desired limit, or (2) by using a pre-defined parametric autocorrelation structure.

For example, you can generate a time series with lag-1 autocorrelation \(\rho_1 = 0.8\) and Generalized Gamma marginal distribution by:

## (i) specifying the sample size
no <- 1000
## (ii) defining the type of marginal distribution and its parameters
marginaldist <- "ggamma"
param <- list(scale = 1, shape1 = .8, shape2 = .8)
## (iii) defining the desired autocorrelation
acf.my <- c(1, 0.8)
## (iv) simulating
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf.my)
## and (v) visually checking the generated time series
quickTSPlot(ggamma_sim[[1]])

You can specify also as many autocorrelation values as you wish. In this example, autocorrelation values are specified up to lag 4:

acf <- c(1, 0.6, 0.5, 0.4, 0.3) # up to lag-4
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf)
quickTSPlot(ggamma_sim[[1]])

A better approach is to use an autocorrelation structure expressed by a parametric function. CoSMoS has built in autocorrelation structures than can be used by the acs() function. You can use the following ACS structures:

  • Fractional Gaussian Noise; this is the well-known one-parameter fGn ACS.
  • Weibull; a flexible two parameter of exponential form.
  • Pareto II: a two parameter power-type ACS.
  • Burr XII: a three parameter power-type ACS.

For most practical applications a two-parameter ACS is sufficient; we suggest either the Weibull or the Pareto II.

The acs() function produces autocorrelation values based on any of the four structures and any desired lag. Thus, instead of setting each autocorrelation coefficient explicitly (which might be problematic from a technical viewpoint), we can generate series preserving any one of the four ACSs. For example, you can easily define an ACS and visualize its values up to any lag. Here, all four ACSs are defined and visualized up to lag 10. You can see how changing the ACS function changes how the correlation coefficients decay to zero.

## specify lag
lags <- 0:10

## get the ACS
f <- acs(id = "fgn", t = lags, H = .75)
b <- acs(id = "burrXII", t = lags, scale = 1, shape1 = .6, shape2 = .4)
w <- acs(id = "weibull", t = lags, scale = 2, shape = 0.8)
p <- acs(id = "paretoII", t = lags, scale = 3, shape = 0.3)

## visualize the ACS
dta <- data.table(lags, f, b, w, p)
m.dta <- melt(data = dta, id.vars = "lags")

ggplot(data = m.dta, mapping = aes(x = lags, y = value, group = variable, colour = variable)) +
  geom_point(size = 2.5) +
  geom_line(lwd = 1) +
  scale_color_manual(
    values = c("steelblue4", "red4", "green4", "darkorange"),
    labels = c("FGN", "Burr XII", "Weibull", "Pareto II"), name = ""
  ) +
  labs(x = bquote(lag ~ tau), y = "ACS") +
  scale_x_continuous(breaks = lags)

We can re-generate the previously generated series which used explicitly defined correlations up to lag 4, now with a two parameter Pareto II ACF up to lag 30, which improves the modelling parsimony. More details about the parametric autocorrelation structures can be found in section 3.2 in Papalexiou (2018).

acf <- acs(id = "paretoII", t = 0:30, scale = 1, shape = .75)
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf)
dta <- data.frame(time = 1:no, value = ggamma_sim[[1]])

quickTSPlot(dta$value)

Apart from the four autocorrelation functions provided by acs(), we can also create our own. Note that it is important to ensure that the function is positive definite, otherwise the autoregressive model cannot be fitted. This example shows the generation of a time series with the same Generalized Gamma marginal distribution as above, but with a user-defined exponential ACS function up to lag 30. Note that although the time series density is very similar to that in the previous plot, the texture of the time series is very different, due to the changed ACS function.

my_acf <- exp(seq(0, -2, -0.1))
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = my_acf)
quickTSPlot(ggamma_sim[[1]])

1.3 Adding Intermittency (if required)

CoSMoS easily simulates intermittent processes such as precipitation. The only extra step needed is to define the probability of zero events. For example, say you wish to generate 5 mutually independent timeseries having the previously defined Generalized Gamma distribution with the Pareto II ACS and probability zero p0 = 0.9. The generated data thus will have 90% of zero values (i.e. dry days).

prob_zero <- .9
## the argument `TSn = 5` enables the simulation of 5 timeseries.
ggamma_sim <- generateTS(
  n = no, margdist = marginaldist, margarg = param, acsvalue = acf,
  p0 = prob_zero, TSn = 5
)
plot(x = ggamma_sim, main = "") 
Five simulated intermittent Generalized Gamma time series (p0 = 0.9).
Five simulated intermittent Generalized Gamma time series (p0 = 0.9).

1.4 Validating the Results

You can readily check some basic statistics of the generated time series using the checkTS() function, which return the first three moments, probability zero and the first three autocorrelation coefficients.

checkTS(ggamma_sim)
##             mean   sd skew   p0 acf_t1 acf_t2 acf_t3
## expected    0.11 0.57 8.57 0.90   0.47   0.29   0.21
## simulation1 0.10 0.49 7.18 0.91   0.42   0.22   0.17
## simulation2 0.08 0.36 6.89 0.90   0.20   0.10   0.10
## simulation3 0.14 0.61 6.39 0.87   0.43   0.23   0.12
## simulation4 0.21 0.85 5.73 0.86   0.49   0.29   0.29
## simulation5 0.08 0.37 7.20 0.91   0.33   0.07   0.08
## attr(,"class")
## [1] "checkTS" "matrix" 
## attr(,"margdist")
## [1] "ggamma"
## attr(,"margarg")
## attr(,"margarg")$scale
## [1] 1
## 
## attr(,"margarg")$shape1
## [1] 0.8
## 
## attr(,"margarg")$shape2
## [1] 0.8
## 
## attr(,"p0")
## [1] 0.9

2 Multivariate and Random Fields Simulation

The above methods can readily be extended to multidimensional cases, thus enabling the simulation of spatiotemporally correlated random vectors (i.e. correlated timeseries at multiple locations) and random fields (i.e. gridded data), as discussed in detail by Papalexiou and Serinaldi (2020). This requires the definition of suitable spatiotemporal correlation structures (see Porcu et al. (2020) for a thorough review of this topic).

2.1 Spatiotemporal Correlation Structures

CoSMoS allows the definition of spatiotemporal correlation functions using the function stcs(), which the spatiotemporal counterpart of the purely temporal acs(). Two classes of spatiotemporal correlation functions are provided:

The stcs() function can produce values of the linear spatiotemporal correlation for any desired time lag and spatial distance using these two correlation function classes, which comprise a variety of structures covering most cases of practical interest. This example shows the Clayton-Weibull spatiotemporal correlation structure:

d <- 51
st <- expand.grid(0:(d - 1), 0:(d - 1))

## get the STCS
wc <- stcfclayton(
  t = st[, 1], s = st[, 2], scfid = "weibull", tcfid = "weibull", copulaarg = 2,
  scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 5.1, shape = 0.8)
)

## visualize the STCS
wc.m <- matrix(data = wc, nrow = d)
j <- tail(which(wc.m[1, ] > 0.15), 1)
i <- tail(which(wc.m[, 1] > 0.15), 1)
wc.m <- wc.m[1:i, 1:j]

persp3D(
  z = wc.m, x = 1:nrow(wc.m), y = 1:ncol(wc.m),
  expand = 1, main = "", scale = TRUE, facets = TRUE,
  xlab = "Time lag", ylab = "Distance", zlab = "STCF", colkey = list(side = 4, length = 0.5),
  phi = 20, theta = 120, resfac = 10, col = gg2.col(100)
)
Clayton-Weibull spatiotemporal correlation structure.
Clayton-Weibull spatiotemporal correlation structure.

2.2 Multivariate Time Series Simulation

Similar to the one-dimensional case (i.e. using the function generateTS()), CoSMoS provides the functions generateMTS() and generateMTSFast() to simulate multiple timeseries with specified (identical) marginals and spatiotemporal correlation structures (STCs). Gaussian Vector AutoRegressive (VAR) models are used to reproduce the parent-Gaussian STFC (see Papalexiou (2018) and Papalexiou and Serinaldi (2020)). The VAR parameters see e.g. (Biller and Nelson 2003) corresponding to the desired spatiotemporal correlation function of the target process, are produced by the function fitVAR().

In this example, a set of five random locations is defined that could represent precipitation in five different places. A VAR is then fitted using

  • a 4th order process (the spatiotemporal correlations will be preserved up to temporal lag 4),
  • the Burr XII marginal distribution,
  • a probability zero p0 = 0.8,
  • a Clayton-Weibull spatiotemporal correlation structure (see Papalexiou and Serinaldi (2020)), using Weibull functions for both the temporal and spatial correlations.

From the five locations, and the VAR, a set of five timeseries is generated. When the series are plotted, the spatio-temporal correlations among the series can be seen.

## set a sequence of hypothetical coordinates
d <- 5
coord <- cbind(runif(d) * 30, runif(d) * 30)

## compute VAR model parameters
fit <- fitVAR(
  spacepoints = coord,
  p = 4,
  margdist = "burrXII",
  margarg = list(scale = 3, shape1 = .9, shape2 = .2),
  p0 = 0.8,
  stcsid = "clayton",
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull", copulaarg = 2,
    scfarg = list(scale = 25, shape = 0.7),
    tcfarg = list(scale = 3.1, shape = 0.8)
  )
)

## generate correlated timeseries
sim <- generateMTS(n = 500, STmodel = fit)

## visualize simulated timeseries
dta <- melt(data = data.table(time = 1:nrow(sim), sim[, 1:d]), id.vars = "time")

ggplot(data = dta, mapping = aes(x = time, y = value)) +
  geom_line() +
  facet_grid(rows = vars(variable), scales = "free_y")
Five spatiotemporally correlated time series simulated with generateMTS().
Five spatiotemporally correlated time series simulated with generateMTS().

The function generateMTSFast() generates multiple time series with approximately separable STCS (Serinaldi and Kilsby 2018), using an algorithm which is computationally less expensive than that in generateMTS(). This allows the simulation of a larger number of cross-correlated and serially dependent timeseries, at the cost of using separable STCSs and accepting some lack of accuracy in the exact reproduction of some terms of the required STCS (Serinaldi and Kilsby 2018).

This example uses generateMTSFast() with similar parameters to the previous example using generateMTS().

## set a sequence of hypothetical coordinates
d <- 5
coord <- cbind(runif(d) * 30, runif(d) * 30)

## fit and generate correlated timeseries
sim <- generateMTSFast(
  n = 500,
  spacepoints = coord,
  p0 = 0.7,
  margdist = "burrXII",
  margarg = list(scale = 3, shape1 = .9, shape2 = .2),
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull",
    scfarg = list(scale = 25, shape = 0.7),
    tcfarg = list(scale = 3.1, shape = 0.8)
  )
)

## visualize simulated timeseries
dta <- melt(data = data.table(time = 1:nrow(sim), sim[, 1:d]), id.vars = "time")

ggplot(data = dta, mapping = aes(x = time, y = value)) +
  geom_line() +
  facet_grid(rows = vars(variable), scales = "free_y")
Five spatiotemporally correlated time series simulated with generateMTSFast().
Five spatiotemporally correlated time series simulated with generateMTSFast().

2.3 Random Fields Simulation

2.3.1 Isotropic Random Fields

CoSMoS simulates spatially and temporally correlated isotropic random fields with the functions generateRF() and generateRFFast(), which are analogs of generateMTS() and generateMTSFast(), with the same syntax. The only difference is the specification of the spatial points, which is an integer denoting the side length of a square grid . As was mentioned in the previous section, the algorithm in generateRFFast() is computationally less expensive than that of generateRF(), enabling the simulation of random fields over a greater number of grid points (see Papalexiou (2018), Papalexiou and Serinaldi (2020), and Serinaldi and Kilsby (2018) for more details).

The example below shows the use of both generateRF and generateRFFast. On the test machine, generateRF took approximately 16 times longer than generateRFFast for this configuration (fitVAR requires roughly 15 s; the subsequent checkRF diagnostics roughly 20 s).

## fit VAR model parameters for a 20x20 grid
fit <- fitVAR(spacepoints = 20, p = 3, margdist = "burrXII",
              margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton",
              stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2,
              scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 1.1, shape = 0.8)))

## generate isotropic random fields with nonseparable correlation structure
sim1 <- generateRF(n = 1000, STmodel = fit)

## fast simulation of isotropic random fields with separable correlation structure
sim2 <- generateRFFast(n = 1000, spacepoints = 20, p0 = 0.7, margdist = "paretoII",
                       margarg = list(scale = 1, shape = .3),
                       stcsarg = list(scfid = "weibull", tcfid = "weibull",
                       scfarg = list(scale = 20, shape = 0.7),
                       tcfarg = list(scale = 1.1, shape = 0.8)))

2.3.2 Validating the Simulated Random Fields

The properties of the generated random fields can be checked with checkRF(). If the parameter method = "stat" (the default), the function returns the first three moments, and the probability zero at 20 grid points. If the parameter method = "statplot", checkRF() returns a multi-panel plot comparing the theoretical and empirical spatial correlation functions at 0, 1, and 2 time lags, the contour plot of the target STCS, and the empirical and target CDFs. If the parameter method = "fields", the function returns level plots of the selected number of simulated random fields (nfields) to visualize the temporal persistence of spatial patterns. If the parameter method = "movie", the fields are written to an animated GIF called “movieRF.gif” in the current working directory.

This example returns the diagnostic plots and fields for the first simulation, which used generateRF.

## check random fields
checkRF(RF = sim1, nfields = 9*9, method = "stat")
checkRF(RF = sim1, nfields = 9*9, method = "statplot")
checkRF(RF = sim1, nfields = 9*9, method = "field")
Validation of simulated random fields: empirical moments (top), correlation functions and CDF (middle), spatial snapshots (bottom).
Validation of simulated random fields: empirical moments (top), correlation functions and CDF (middle), spatial snapshots (bottom).

2.3.3 Anisotropic Random Fields

The functions generateRF() and generateRFFast() allows for the simulation of spatially and temporally correlated anisotropic random fields by specifying the arguments anisotropyid and anisotropyarg when calculating the model parameters via fitVAR(). Three types of anisotropic effects are supported, namely: affine (rotation and stretching), and swirl-like, and “wavy” deformation (see the documentation of the function anisotropyT, Papalexiou, Serinaldi and Porcu (2021), and references therein for more details).

The example below shows the simulation of swirl-like random fields mimicking the shape of atmospheric cyclones. First, the transformed coordinate system is visualised:

## specify a grid of coordinates
m <- 30
aux <- seq(0, m - 1, length = m)
coord <- expand.grid(aux, aux)

## transform the coordinate system
at <- anisotropyTswirl(
  spacepoints = coord, x0 = floor(m / 2), y0 = floor(m / 2),
  b = 10, alpha = 1.8 * pi
)

## visualize transformed coordinate system
aux <- data.frame(lon = at[, 1], lat = at[, 2], id1 = rep(1:m, each = m), id2 = rep(1:m, m))
ggplot(aux, aes(x = lon, y = lat)) +
  geom_path(aes(group = id1)) +
  geom_path(aes(group = id2)) +
  geom_point(col = 2)
Swirl-transformed coordinate system (b = 10, α = 1.8π).
Swirl-transformed coordinate system (b = 10, α = 1.8π).

Then, a set of swirl-like random fields with the desired spatiotemporal correlation structure is generated and visualised:

## compute VAR model parameters
fit <- fitVAR(
  spacepoints = m, p = 1, margdist = "burrXII",
  margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.1, stcsid = "clayton",
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull", copulaarg = 2,
    scfarg = list(scale = 2, shape = 0.7), tcfarg = list(scale = 20.1, shape = 0.8)
  ),
  anisotropyid = "swirl",
  anisotropyarg = list(x0 = floor(m / 2), y0 = floor(m / 2), b = 10, alpha = 1.8 * pi)
)

## generate
set.seed(9)
sim3 <- generateRF(n = 25, STmodel = fit)

## check
checkRF(RF = sim3, nfields = 5 * 5, method = "field")
Simulated swirl-like (cyclone-shaped) random fields over a 30×30 grid.
Simulated swirl-like (cyclone-shaped) random fields over a 30×30 grid.

2.3.4 Lagrangian Random Fields

CoSMoS implements advances in space-time simulation presented in Papalexiou, Serinaldi and Porcu (2021) that enable the simulation of Lagrangian random fields, i.e. random fields characterized by mass transport (advection). The motion is described by velocity vectors with specified orthogonal components u and v at each grid point. Motion vectors can be specified by setting the arguments advectionid and advectionarg of fitVAR function. Several types of advection fields are supported, but users can create their own following the structure of the existing functions (see the documentation of the function advectionF for more details).

The example below refers to the generation of random fields with affine anisotropy and uniform advection, mimicking rainfall storms moving north-easterly.

## compute VAR model parameters
fit <- fitVAR(
  spacepoints = 30, p = 1, margdist = "burrXII",
  margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton",
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull", copulaarg = 2,
    scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 30.1, shape = 0.8)
  ),
  anisotropyid = "affine",
  anisotropyarg = list(phi1 = 2., phi2 = 0.5, phi12 = 0, theta = -pi / 3),
  advectionid = "uniform", advectionarg = list(u = 2.5, v = 1.5)
)

## generate
set.seed(10) # 1 # 5
sim3 <- generateRF(n = 81, STmodel = fit)

## check
checkRF(RF = sim3, nfields = 9 * 9, method = "field")
Simulated Lagrangian random fields with affine anisotropy and uniform north-easterly advection.
Simulated Lagrangian random fields with affine anisotropy and uniform north-easterly advection.

2.3.5 Random Fields with non-Gaussian Dependence Structures

CoSMoS also enables the simulation of (Lagrangian anisotropic) random fields with dependence structures (copulas) different from Gaussian. Namely, we consider the Student copula (Demarta and McNeil, 2007) and Bárdossy copula (Bárdossy, 2006). The former is characterized by upper and lower tail dependence stronger than those of a Gaussian dependence structure, while the latter allows for asymmetric tail dependence, in particular, upper or lower tail dependence weaker than that of a Gaussian copula. Both models are parametrized by dependence parameters that are identical or directly related to the correlation values characterizing a parent multidimensional Gaussian distribution. Therefore, these models preserve pairwise correlations at multiple locations, and their simulation still relies on the algorithms available for Gaussian processes. We report below an example of correlated samples at two hypothetical locations characterized by standard Gaussian marginal distributions, identical linear correlation, but different copulas (Gaussian, Student, and Bárdossy).

## set the hypothetical coordinates of two locations
coord <- matrix(rep(c(0, 1.65), times = 2), ncol = 2)

## compute VAR model parameters
## Gaussian dependence structure
fit1 <- fitVAR(
  spacepoints = coord, p = 1, margdist = "norm",
  margarg = list(mean = 0, sd = 1), p0 = 0.0, stcsid = "clayton",
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull", copulaarg = 2,
    scfarg = list(scale = 20, shape = 0.7),
    tcfarg = list(scale = 30.1, shape = 0.8)
  )
)
## Student dependence structure
fit2 <- fitVAR(
  spacepoints = coord, p = 1, margdist = "norm",
  margarg = list(mean = 0, sd = 1), p0 = 0.0, stcsid = "clayton",
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull", copulaarg = 2,
    scfarg = list(scale = 20, shape = 0.7),
    tcfarg = list(scale = 30.1, shape = 0.8)
  ),
  dsid = "student", dsarg = 4
)
## Bardossy dependence structure
fit3 <- fitVAR(
  spacepoints = coord, p = 1, margdist = "norm",
  margarg = list(mean = 0, sd = 1), p0 = 0.0, stcsid = "clayton",
  stcsarg = list(
    scfid = "weibull", tcfid = "weibull", copulaarg = 2,
    scfarg = list(scale = 20, shape = 0.7),
    tcfarg = list(scale = 30.1, shape = 0.8)
  ),
  dsid = "bardossyF", dsarg = 0.5
)
## generate
set.seed(10)
sim1 <- data.frame(generateRF(n = 5000, STmodel = fit1))
sim2 <- data.frame(generateRF(n = 5000, STmodel = fit2))
sim3 <- data.frame(generateRF(n = 5000, STmodel = fit3))

## visual check
sim1 <- cbind(sim1, id = "1 - Gauss copula")
sim2 <- cbind(sim2, id = "2 - Student copula")
sim3 <- cbind(sim3, id = "3 - flipped Bárdossy copula")
dta <- rbind(sim1, sim2, sim3)

ggplot(data = dta) +
  geom_point(mapping = aes(x = X1, y = X2), alpha = 0.07) +
  facet_wrap(facets = ~id, nrow = 1)
Bivariate samples (n = 5000) from Gaussian, Student (df = 4), and flipped Bárdossy copulas with identical linear correlation.
Bivariate samples (n = 5000) from Gaussian, Student (df = 4), and flipped Bárdossy copulas with identical linear correlation.
## linear correlations
pc <- round(c(
  cor.G = cor(sim1[, 1:2])[1, 2], cor.S = cor(sim2[, 1:2])[1, 2],
  cor.B = cor(sim3[, 1:2])[1, 2]
), 3)
pc
## cor.G cor.S cor.B 
## 0.805 0.799 0.776

3 Stochastic Simulation of Real-World Processes

In practice, it is often desired to simulate a natural process based on the properties of an observed time series. This section shows how this can easily be done using CoSMoS functions.

3.1 Hourly Precipitation

Precipitation can be easily simulated in CoSMoS from observed data. This is an example of a time series of observed hourly precipitation (units are hundredths of an inch).

data("precip")
quickTSPlot(precip$value)
Observed hourly precipitation record.
Observed hourly precipitation record.

To generate a synthetic time series with similar characteristics to the observed precipitation, you have to determine (1) the marginal distribution, (2) the autocorrelation structure, and (3) the probability zero for each individual month. This can be done by trying to fit various distributions and autocorrelation structures with analyzeTS() and then checking the goodness-of-fit with reportTS(), which can return the plots of the distribution (method = "dist"), ACS (method = "acs") or basic statistics (method = "stat"). The analyzeTS() call is computationally intensive (approximately 75 s on a modern laptop); results below are loaded from pre-computed objects.

precip_ggamma <- analyzeTS(TS = precip, season = "month", dist = "ggamma",
                           acsID = "weibull", lag.max = 12)
reportTS(aTS = precip_ggamma, method = "dist") 
Fitted Generalized Gamma distribution (top) and Weibull ACS (bottom) against monthly precipitation data.
Fitted Generalized Gamma distribution (top) and Weibull ACS (bottom) against monthly precipitation data.
reportTS(aTS = precip_ggamma, method = "acs") 
Fitted Generalized Gamma distribution (top) and Weibull ACS (bottom) against monthly precipitation data.
Fitted Generalized Gamma distribution (top) and Weibull ACS (bottom) against monthly precipitation data.

In this example, the Generalized Gamma distribution and the Weibull autocorrelation structure describe well the observed time series. Other combinations might not give such good results. For example, the Pareto II distribution and the Fractional Gaussian Noise ACS do not give good monthly fits:

precip_pareto <- analyzeTS(TS = precip, season = "month", dist = "paretoII", acsID = "fgn", lag.max = 12)
reportTS(aTS = precip_pareto, method = "dist") 
Fitted Pareto II distribution (top) and FGN ACS (bottom) against monthly precipitation data: note the poor fit relative to the Generalized Gamma / Weibull combination.
Fitted Pareto II distribution (top) and FGN ACS (bottom) against monthly precipitation data: note the poor fit relative to the Generalized Gamma / Weibull combination.
reportTS(aTS = precip_pareto, method = "acs") 
Fitted Pareto II distribution (top) and FGN ACS (bottom) against monthly precipitation data: note the poor fit relative to the Generalized Gamma / Weibull combination.
Fitted Pareto II distribution (top) and FGN ACS (bottom) against monthly precipitation data: note the poor fit relative to the Generalized Gamma / Weibull combination.

Once you choose the marginal distribution and the autocorrelation structure you can produce as many and as long synthetic series you wish. The generated series will reproduce the distribution, correlation structure and probability zero (intermittency), using the simulateTS() function. This example uses the Generalized Gamma distribution and the Weibull autocorrelation structure, which were fitted above.

sim_precip <- simulateTS(
  aTS = precip_ggamma, from = as.POSIXct(x = "1978-12-01 00:00:00"),
  to = as.POSIXct(x = "2008-12-01 00:00:00")
)
dta <- precip
dta[, id := "observed"]
sim_precip[, id := "simulated"]
dta <- rbind(dta, sim_precip)

ggplot(data = dta) +
  geom_line(mapping = aes(x = date, y = value)) +
  facet_wrap(facets = ~id, ncol = 1)
Observed (top) and simulated (bottom) hourly precipitation.
Observed (top) and simulated (bottom) hourly precipitation.

3.2 Daily streamflow

We can also apply the same methods to daily streamflows. For this example we have used the streamflow records of the Nassawango Creek near Snow Hill (Maryland, US; USGS Id 01485500) from the US Geological Survey (USGS) repository (https://waterdata.usgs.gov/). In this case, the fitted distribution was log-normal, and the ACS was Pareto II. The analyzeTS() call requires approximately 240 s on a modern laptop; results are loaded from pre-computed objects.

data("disch")
str <- analyzeTS(TS = disch, dist = "lnorm", norm = "N2", acsID = "paretoII",
                 lag.max = 20, constrain = TRUE, season = "month")
reportTS(aTS = str) 
Fitted log-normal distribution and Pareto II ACS against monthly streamflow data for Nassawango Creek.
Fitted log-normal distribution and Pareto II ACS against monthly streamflow data for Nassawango Creek.

Given the fitted values, streamflows can be simulated as:

sim_str <- simulateTS(aTS = str)

dta <- disch
dta[, id := "observed"]
sim_str[, id := "simulated"]
dta <- rbind(dta, sim_str)

ggplot(data = dta) +
  geom_line(mapping = aes(x = date, y = value)) +
  facet_wrap(facets = ~id, ncol = 1)
Observed (top) and simulated (bottom) daily streamflow for Nassawango Creek.
Observed (top) and simulated (bottom) daily streamflow for Nassawango Creek.

4 Recommended and Supported Distributions in CoSMoS

4.2 Supported Distribution in CoSMoS

CoSMoS supports most distributions available in R with defined quantile, distribution, and probability density functions. Additionally, we provide four flexible distributions that we highly recommend.

Name Argument Parameters
Burr type III burrIII list(scale, shape1, shape2)
Burr type XII burrXII list(scale, shape1, shape2)
Generalized Gamma ggamma list(scale, shape1, shape2)
Pareto II (Lomax) paretoII list(scale, shape)

Full PDF, CDF, quantile, and raw moment formulae for each distribution are given below.

Burr type III (burrIII, parameters: scale, shape1, shape2)

\[f_{\text{BrIII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}-1} \left(\gamma_1^{-1} \left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}+1\right) ^{-\gamma_1 \gamma_2-1}}{\beta } \\ F_{\text{BrIII}}(x)= \left(\gamma_1^{-1} \left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}+1\right) ^{-\gamma _1 \gamma _2} \\ Q_{\text{BrIII}}(u)=\beta \left(\gamma _1 \left(u^{-\frac{1}{\gamma _1 \gamma _2}}-1\right)\right)^{-\gamma _2} \\ m_{\text{BrIII}}(q)=\frac{\beta ^q \gamma _1^{\gamma _2 (-q)} \Gamma \left(\left(q+\gamma _1\right) \gamma _2\right) \Gamma \left(1-q \gamma _2\right)}{\Gamma \left(\gamma _1 \gamma _2\right)}\]

Burr type XII (burrXII, parameters: scale, shape1, shape2)

\[f_{\text{BrXII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{\gamma _1-1} \left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right)^{-\frac{1}{\gamma _1 \gamma _2}-1}}{\beta } \\ F_{\text{BrXII}}(x)=1-\left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right)^{-\frac{1}{\gamma _1 \gamma _2}} \\ Q_{\text{BrXII}}(u)=\beta \left(-\frac{1-(1-u)^{-\gamma _1 \gamma _2}}{\gamma _2}\right)^{\frac{1}{\gamma _1}} \\ m_{\text{BrXII}}(q)=\frac{\beta ^q \gamma _2^{-\frac{q}{\gamma _1}-1} B\left(\frac{q+\gamma _1}{\gamma _1},\frac{1-q \gamma _2}{\gamma _1 \gamma _2}\right)}{\gamma _1}\]

Generalized Gamma (ggamma, parameters: scale, shape1, shape2)

\[f_{\mathcal{G}\mathcal{G}}(x)=\frac{\gamma _2 e^{-\left(\frac{x}{\beta }\right)^{\gamma _2}} \left(\frac{x}{\beta }\right)^{\gamma _1-1}}{\beta \Gamma \left(\frac{\gamma _1}{\gamma _2}\right)} \\ F_{\mathcal{G}\mathcal{G}}(x)=Q\left(\frac{\gamma _1}{\gamma _2},0,\left(\frac{x}{\beta }\right)^{\gamma _2}\right) \\ Q_{\mathcal{G}\mathcal{G}}(u)=\beta Q^{-1}\left(\frac{\gamma _1}{\gamma _2},0,u\right)^{\frac{1}{\gamma _2}} \\ m_{\mathcal{G}\mathcal{G}}(q)=\frac{\beta ^q \Gamma \left(\frac{q}{\gamma _2}+\frac{\gamma _1}{\gamma _2}\right)}{\Gamma \left(\frac{\gamma _1}{\gamma _2}\right)}\]

Pareto II (paretoII, also known as Lomax, parameters: scale, shape)

\[f_{\text{PII}}(x)=\frac{\left(\frac{\gamma x}{\beta }+1\right)^{-\frac{1}{\gamma }-1}}{\beta } \\ F_{\text{PII}}(x)=1-\left(\frac{\gamma x}{\beta }+1\right)^{-1/\gamma } \\ Q_{\text{PII}}(u)=\frac{\beta \left((1-u)^{-\gamma }-1\right)}{\gamma } \\ m_{\text{PII}}(q)=\frac{\Gamma (q+1) \left(\frac{\beta }{\gamma }\right)^q \Gamma \left(\frac{1}{\gamma }-q\right)}{\Gamma \left(\frac{1}{\gamma }\right)}\]

5 Historical Note

The genesis of CoSMoS dates from 2009 when Simon Michael Papalexiou developed the core of its methods while at National Technical University of Athens (NTUA), Greece, and applied them initially for multivariate rainfall modeling for the needs of an internal project at NTUA. Simon soon invited a few “colleagues” to build a start-up company, named Oktana, to develop stochastic simulation software. The details were described in a legal document (Papalexiou, 2010) as a first step to create proprietary software, although Oktana was never created.

In 2015, Simon with Yannis Markonis reconsidered the idea of a start-up company participating in the Climate-KIC competition. In 2017 Simon moved to California and Yannis to Prague, and they started discussing the release of CoSMoS as open-source software. The methods were published in arXiv on July 21, 2017 (Papalexiou, 2017) and few months later in Advances in Water Resources (Papalexiou, 2018), followed by a publication on disaggregation (DiPMaC) in WRR (Papalexiou et al., 2018). In the meanwhile, Filip Strnad joined the team in 2018 and coded the first version of CoSMoS in R, as the original code was developed by Simon in Mathematica.

Francesco Serinaldi joined the team in 2019 initially to co-author a paper on random fields with Simon which was published in WRR in January 2020 (Papalexiou and Serinaldi, 2020). Francesco developed the R code of CoSMoS version 2, collaborating with Filip and Kevin, and extending its functionalities to include the methods of the previously mentioned paper, that is, enabling the simulation of random fields and multivariate time series. The last addition to our team is Kevin Shook who has become the package maintainer and offers advice on R.

The latest version of CoSMoS implements major advances in space time modeling introduced in Papalexiou, Serinaldi and Porcu (2021) that allow realistic simulations of hydro-environmental fluxes, including rainfall storms, fields resembling weather cyclones, fields converging to (or diverging from) a point, colliding air masses, and more.

We are glad to see this ten-year Odyssey finally back on track to Ithaca. Leaving the burdens of the past behind, empowered by a great team and standing for dignity and academic integrity, we will keep striving to offer free, innovative, and reliable software for stochastic modelling.

6 References

  1. Papalexiou, S.M. (2010). Stochastic modelling demystified. https://doi.org/10.13140/RG.2.2.34889.60008
  2. Papalexiou, S.M. (2017). A unified theory for exact stochastic modelling of univariate and multivariate processes with continuous, mixed type, or discrete marginal distributions and any correlation structure. arXiv:1707.06842. https://arxiv.org/abs/1707.06842
  3. Papalexiou, S.M. (2018). Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency. Advances in Water Resources, 115, 234–252. https://doi.org/10.1016/j.advwatres.2018.02.013
  4. Papalexiou, S.M., Markonis, Y., Lombardo, F., AghaKouchak, A., Foufoula-Georgiou, E. (2018). Precise Temporal Disaggregation Preserving Marginals and Correlations (DiPMaC) for Stationary and Nonstationary Processes. Water Resources Research, 54(10), 7435–7458. https://doi.org/10.1029/2018WR022726
  5. Papalexiou, S.M., Serinaldi, F. (2020). Random Fields Simplified: Preserving Marginal Distributions, Correlations, and Intermittency, With Applications From Rainfall to Humidity. Water Resources Research, 56(2), e2019WR026331. https://doi.org/10.1029/2019WR026331
  6. Papalexiou, S.M., Serinaldi, F., Porcu, E. (2021). Advancing Space-Time Simulation of Random Fields: From Storms to Cyclones and Beyond. Water Resources Research, 57, e2020WR029466. https://doi.org/10.1029/2020WR029466
  7. Serinaldi, F., Kilsby, C.G. (2018). Unsurprising Surprises: The Frequency of Record-breaking and Overthreshold Hydrological Extremes Under Spatial and Temporal Dependence. Water Resources Research, 54(9), 6460–6487. https://doi.org/10.1029/2018WR023055