CausalMixGPD has two orthogonal dials you turn when building models:
Backend (how the mixture weights / clustering are represented)
CRP: Chinese Restaurant Process representation.
SB: stick-breaking truncation with a fixed number of components.
Kernel / family (what distribution models the bulk of the data)
Examples: normal, lognormal, gamma, inverse-Gaussian, Laplace, Cauchy, Amoroso, etc.
Optionally, you can also turn on:
GPD = TRUE/FALSE to splice a Generalized Pareto tail beyond a threshold.
For the math behind the bulk–tail “spliced” construction (threshold exceedances, DPM bulk, and spliced quantiles), see Theory: GPD tails + DPM bulk + splicing.
What changes between CRP and SB?
Both backends target the same posterior over densities. The difference is representation:
CRP learns a random number of occupied clusters within a finite components cap.
SB uses the same finite components cap and learns stick-breaking weights.
Practical rule of thumb:
CRP is convenient when you want adaptive complexity while still using a finite components cap.
SB is convenient when you want predictable memory/time and easy vectorization.
What does the workflow look like?
CausalMixGPD uses a consistent build -> run -> summarize loop:
Build a bundle using bundle() (or causal builders if you are doing TE work).
Fit directly using dpmix() (or dpmgpd() when GPD = TRUE).
Inspect and summarize using print(), summary(), plot().
Predict using predict(). For conditional (covariate) models, fitted() is also available.
MixGPD fit | backend: Chinese Restaurant Process | kernel: Normal Distribution | GPD tail: FALSE
n = 50 | components = 5 | epsilon = 0.025
MCMC: niter=400, nburnin=100, thin=2, nchains=1
Fit
Use summary() for posterior summaries; plot() for diagnostics; predict() for predictions.
Most users should stick to dpmix() / dpmgpd() (and their causal / clustering counterparts). For debugging or reproducibility workflows, the package also exports:
run_mcmc_bundle_manual() / run_mcmc_causal() for an explicit build → compile → run pipeline (useful when you want to inspect intermediate NIMBLE objects).
sim_bulk_tail(), sim_causal_qte(), and sim_survival_tail() for generating toy datasets that match the package’s bulk/tail and causal examples.
Where to go next
Available distributions: see the Kernels hub and kernel catalog pages.