Fit a clustering-only bulk-tail model

Variant of dpmix.cluster() that augments the cluster kernel with a generalized Pareto tail. This is the clustering analogue of the spliced bulk-tail workflow used by dpmgpd().

Usage

dpmgpd.cluster(
  formula,
  data,
  type = c("weights", "param", "both"),
  default = "weights",
  mcmc = list(),
  ...
)

Arguments

formula

Model formula. The response must be present in data.

data

Data frame containing the response and optional predictors.

type

Clustering mode:

"weights": links mixture weights to predictors
"param": links kernel parameters to predictors
"both": links both weights and kernel parameters to predictors

default

Default mode used when type is omitted.

mcmc

MCMC control list passed into the cluster bundle.

...

Additional arguments passed to build_cluster_bundle(), including kernel settings, prior overrides, component counts, and monitoring controls.

Value

Object of class dpmixgpd_cluster_fit.

Details

For observations above a component-specific threshold, the component density is spliced as $$ f(y) = (1 - F_{bulk}(u)) g_{GPD}(y \mid u, \sigma_u, \xi_u), \qquad y \ge u, $$ so cluster assignment can be informed by both central behavior and tail behavior.

This interface is preferable when cluster separation is driven by upper-tail differences rather than bulk-only shape or location differences.

Examples

# \donttest{
data("nc_posX100_p3_k2", package = "CausalMixGPD")
dat <- data.frame(y = nc_posX100_p3_k2$y[1:20],
                  nc_posX100_p3_k2$X[1:20, , drop = FALSE])
fit <- dpmgpd.cluster(
  y ~ x1 + x2 + x3,
  data = dat,
  kernel = "gamma",
  type = "param",
  components = 3,
  mcmc = list(niter = 60, nburnin = 30, thin = 1, nchains = 1, seed = 1)
)
#> [cluster] Validating configuration
#> [cluster] Checking build/compile cache
#> [cluster] Building model and MCMC configuration
#> [cluster] Compiling NIMBLE model
#> [cluster] Initializing chains
#> [cluster] Running MCMC
#> [cluster] Finalizing WAIC and diagnostics
#> [cluster] Assembling fit object
cluster_profiles(fit)
#>   cluster  n y_mean y_sd x1_mean x1_sd x2_mean x2_sd x3_mean x3_sd
#> 1      C1 20  3.334 2.25   0.211 0.918  -0.096 0.563  -0.489 0.859
#>   certainty_mean certainty_sd
#> 1              1            0
# }

Usage

Arguments

Value

Details

See also

Examples