CausalMixGPD
  • Home
  • Roadmaps
    • Website roadmap
    • Package roadmap
  • Start
    • Start Hub
    • Roadmap
    • Usage Diagrams
    • Start Here
    • Basic Compile and Run
    • Backends and Workflow
    • Troubleshooting
  • Tracks
    • Quickstart
    • Modeling (1-arm)
    • Causal
    • Clustering
    • Kernels & tails
    • Customization
  • Examples
  • Kernels
  • Advanced
  • Developers
  • Reference
    • Reference hub
    • Function reference by job
  • News
  • Cite
  • Coverage
  • API Reference

ex02. Unconditional DPM (SB Backend)

Website workflow note. This page reflects the current exported API and recommended wrapper-first usage. Last updated: 2026-02-19.

For the full package narrative, see the main package vignettes (basic, unconditional, conditional, and causal).

Unconditional DPmix: Stick-Breaking (SB) Backend

Purpose: Showcase the stick-breaking backend that uses a fixed number of mixture components (components = J) and contrast it with the CRP backend from start/backends-and-workflow. In this vignette we fit two bulk kernels (Gamma and Cauchy) on the same dataset to highlight how heavier tails in the bulk can change the fit even without a GPD tail.

What you’ll learn

  • How the stick-breaking (SB) backend differs from CRP in practice (fixed truncation via components = J).
  • How to use bundle() + dpmix() + predict()/plot() for an unconditional bulk-only DP mixture.
  • How bulk kernel choice (e.g., Gamma vs Cauchy) can change tail behavior even when GPD = FALSE.

When to use this template

  • You want a predictable runtime mixture workflow with a fixed upper bound on components.
  • You mainly care about bulk fit, density estimation, and posterior predictive summaries.

Next steps

  • If extremes matter, extend to dpmgpd() with GPD = TRUE (see ex03/ex04).

Data Setup

Code
# Generate a reproducible mock dataset instead of relying on saved cache files.
set.seed(10102)
n <- 200
weights <- c(0.4, 0.35, 0.25)
params <- list(
  list(shape = 2.0, rate = 1.5),
  list(shape = 4.0, rate = 1.0),
  list(shape = 7.0, rate = 0.7)
)
component <- sample(seq_along(weights), size = n, replace = TRUE, prob = weights)
y_mixed <- numeric(n)
for (k in seq_along(params)) {
  idx <- which(component == k)
  if (length(idx) > 0) {
    y_mixed[idx] <- rgamma(length(idx), shape = params[[k]]$shape, rate = params[[k]]$rate)
  }
}

df_data <- data.frame(y = y_mixed)

summary_tbl <- tibble(
  statistic = c("N", "Mean", "SD", "Min", "Max"),
  value = c(length(y_mixed), mean(y_mixed), sd(y_mixed), min(y_mixed), max(y_mixed))
)

p_raw <- ggplot(df_data, aes(x = y)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "darkorange", alpha = 0.7, color = "black") +
  geom_density(color = "darkred", linewidth = 1.2) +
  labs(title = "Raw Data: Mixed Gamma Distribution", x = "y", y = "Density") +
  theme_minimal()

print(p_raw)

# A tibble: 5 × 2
  statistic    value
  <chr>        <dbl>
1 N         200     
2 Mean        4.21  
3 SD          4.11  
4 Min         0.0403
5 Max        19.6   

Model Specification & Bundle

We build the stick-breaking mixture explicitly with components = 5 so that there is room for weight decay while keeping the MCMC runtime manageable for a vignette. We then refit the same SB model with a Cauchy kernel for comparison.

Code
# --- Gamma kernel ---
bundle_sb_gamma <- bundle(
  y = y_mixed,
  kernel = "gamma",
  backend = "sb",
  components = 5,            # fixed truncation level for SB
  GPD = FALSE,               # bulk-only scenario
  mcmc = mcmc
)

# --- Cauchy kernel ---
bundle_sb_cauchy <- bundle(
  y = y_mixed,
  kernel = "cauchy",
  backend = "sb",
  components = 5,
  GPD = FALSE,
  mcmc = mcmc
)

Running MCMC & Summary

Code
fit_sb_gamma <- load_or_fit("ex02-unconditional-dpm-sb-fit_sb_gamma", dpmix(bundle_sb_gamma))
fit_sb_cauchy <- load_or_fit("ex02-unconditional-dpm-sb-fit_sb_cauchy", dpmix(bundle_sb_cauchy))
Code
summary(fit_sb_gamma)
MixGPD summary | backend: Stick-Breaking Process | kernel: Gamma Distribution | GPD tail: FALSE | epsilon: 0.025
n = 200 | components = 5
Summary
Initial components: 5 | Components after truncation: 3

Summary table
  parameter  mean    sd q0.025 q0.500 q0.975    ess
 weights[1] 0.395 0.042  0.334  0.387  0.485  3.212
 weights[2] 0.321 0.035  0.242  0.323  0.393 17.083
 weights[3] 0.195  0.05  0.108  0.202   0.28  7.824
      alpha 1.186 0.609  0.305  1.083  2.407 44.491
   shape[1] 2.237 1.283  1.025   1.55  4.779 10.926
   shape[2] 3.462  1.43  1.138  3.483  5.654  7.937
   shape[3] 2.602 1.228  1.226  2.204  5.166  8.559
   scale[1] 0.575 0.194   0.32  0.571  0.896 28.382
   scale[2] 0.773 0.555  0.293  0.571  2.199  4.083
   scale[3] 1.224 0.589  0.399   1.12  2.446  7.623
Code
summary(fit_sb_cauchy)
MixGPD summary | backend: Stick-Breaking Process | kernel: Cauchy Distribution | GPD tail: FALSE | epsilon: 0.025
n = 200 | components = 5
Summary
Initial components: 5 | Components after truncation: 3

Summary table
   parameter  mean    sd q0.025 q0.500 q0.975    ess
  weights[1] 0.502 0.081  0.378  0.481  0.621  3.202
  weights[2] 0.232 0.035  0.169  0.232  0.315 19.143
  weights[3] 0.147 0.044  0.064   0.16  0.212  5.279
       alpha  1.32 0.645  0.321  1.272  2.743 52.126
 location[1]  1.25 0.182  0.995    1.2  1.569  5.851
 location[2] 5.649 2.556  2.646  5.997  9.852  8.551
 location[3]   5.2 2.037  2.534  5.446  9.832 34.878
    scale[1] 0.637 0.139  0.431  0.617  0.912 11.565
    scale[2] 1.312  0.72  0.473   1.07  2.923 14.081
    scale[3] 1.058  0.52  0.367  1.003  2.341 76.149
Code
params_gamma <- params(fit_sb_gamma)
params_gamma
Posterior mean parameters

$alpha
[1] 1.186

$w
[1] 0.3947 0.3213 0.1948

$shape
[1] 2.237 3.462 2.602

$scale
[1] 0.5752 0.7728 1.2240

MCMC Diagnostics (S3 Plot Methods)

Code
# Default diagnostics for each fit
plot(fit_sb_gamma, family = c("traceplot", "autocorrelation", "running"))

=== traceplot ===


=== autocorrelation ===


=== running ===

Code
plot(fit_sb_cauchy, family = c("density", "geweke", "caterpillar"))

=== density ===


=== geweke ===


=== caterpillar ===

Use summary(fit_sb_gamma) and summary(fit_sb_cauchy) to inspect effective sample size, R-hat, and other convergence diagnostics; the plot() calls above show the trace/density pairs for both global and stick-breaking weight parameters.


Stick-Breaking Weights & Component Activity

The stick-breaking weights are exposed via the plot() diagnostics above, and you can refer to the raw fit_sb_gamma / fit_sb_cauchy objects (or their summary() output) for posterior summaries of each component weight.


Posterior Predictions

Code
y_grid <- seq(min(y_mixed), max(y_mixed) * 1.3, length.out = 250)

pred_density_gamma <- predict(fit_sb_gamma, y = y_grid, type = "density")
pred_density_cauchy <- predict(fit_sb_cauchy, y = y_grid, type = "density")

plot(pred_density_gamma)

Code
plot(pred_density_cauchy)

Code
quant_probs <- c(0.05, 0.25, 0.5, 0.75, 0.95)

pred_q_gamma <- predict(fit_sb_gamma, type = "quantile", p = quant_probs, interval = "credible")
pred_q_cauchy <- predict(fit_sb_cauchy, type = "quantile", p = quant_probs, interval = "credible")

plot(pred_q_gamma)

Code
plot(pred_q_cauchy)

Code
pred_mean_gamma <- predict(fit_sb_gamma, type = "mean")
pred_mean_cauchy <- predict(fit_sb_cauchy, type = "mean")

plot(pred_mean_gamma)

Code
plot(pred_mean_cauchy)


Bulk-only vs CRP Backends

Code
bundle_crp_small <- bundle(
  y = y_mixed,
  kernel = "gamma",
  backend = "crp",
  components = 5,
  GPD = FALSE,
  mcmc = mcmc
)
fit_crp_small <- load_or_fit("ex02-unconditional-dpm-sb-fit_crp_small", dpmix(bundle_crp_small))

crp_quant <- predict(fit_crp_small, type = "quantile", p = quant_probs, interval = "credible")
sb_quant <- predict(fit_sb_gamma, type = "quantile", p = quant_probs, interval = "credible")

bind_rows(
  crp_quant$fit %>% mutate(model = "CRP"),
  sb_quant$fit %>% mutate(model = "SB")
) %>%
  select(any_of(c("model", "index", "estimate", "lower", "upper")))
   model index  estimate      lower     upper
1    CRP  0.05 0.1067670 0.04176633  0.213226
2    CRP  0.25 0.3266968 0.16551984  0.568718
3    CRP  0.50 0.6314040 0.34692305  1.082989
4    CRP  0.75 1.3282693 0.62991581  2.862594
5    CRP  0.95 4.7332026 1.30743509 17.604264
6     SB  0.05 0.2686400 0.04578688  1.181490
7     SB  0.25 0.8059203 0.24201029  2.351381
8     SB  0.50 1.6058160 0.59798532  3.955477
9     SB  0.75 2.9409217 1.13342608  7.342066
10    SB  0.95 6.3086670 2.08820530 14.132008
Code
plot(crp_quant)

Code
plot(sb_quant)


For unconditional models, use predict() only; fitted() and residuals() are not supported.


Takeaways

  • SB Backend: Fixed components keeps label-switching manageable, and the diagnostic plots via the S3 plot() method show weight dynamics.
  • Kernels matter: Gamma vs Cauchy can behave very differently in the shoulders/tails, even without a GPD tail.
  • Predictions: Posterior density, posterior-mean quantiles, and mean (via predictive sampling) are accessible via predict() with pointwise 95% credible intervals.
  • Backend comparison: The CRP fit (start/backends-and-workflow) and the SB-gamma fit deliver similar central quantiles while SB offers more control over truncation.
  • Next: Explore tail augmentation (GPD = TRUE) with the CRP backend in ex03 or SB backend in ex04.

Workflow Navigation

  • Previous: ex01-unconditional-dpm-crp
  • Next: ex03-unconditional-dpmgpd-crp
  • Workflow index: Roadmap
  • Practical entry: Examples

Prereqs

  • Required packages and data for this page are listed in the setup chunks above.

Outputs

  • This page renders model fits, diagnostics, and summary artifacts generated by package APIs.

Interpretation

  • Canonical concept page: Model Umbrella
  • Treat this page as an application/example view and use the canonical page for core definitions.

Next

  • Continue to the linked canonical concept page, then return for implementation-specific details.
(c) CausalMixGPD - Bayesian semiparametric modeling for heavy-tailed data
- - Cite - API - GitHub