Fit a clustering-only bulk model

Build and fit a Dirichlet-process mixture for clustering without causal estimands or posterior prediction for a response surface. This interface focuses on latent partition recovery from a formula specification and returns a cluster-fit object that can be summarized, plotted, or converted into labels and posterior similarity matrices with predict.dpmixgpd_cluster_fit().

Usage

dpmix.cluster(
  formula,
  data,
  type = c("weights", "param", "both"),
  default = "weights",
  mcmc = list(),
  ...
)

Arguments

formula

Model formula. The response must be present in data.

data

Data frame containing the response and optional predictors.

type

Clustering mode:

"weights": links mixture weights to predictors
"param": links kernel parameters to predictors
"both": links both weights and kernel parameters to predictors

default

Default mode used when type is omitted.

mcmc

MCMC control list passed into the cluster bundle.

...

Additional arguments passed to build_cluster_bundle(), including kernel settings, prior overrides, component counts, and monitoring controls.

Value

Object of class dpmixgpd_cluster_fit.

Details

The fitted model targets a latent partition $z_1, \dots, z_n$ with component-specific kernel parameters. Depending on type, predictors can enter through the gating probabilities $$ \Pr(z_i = k \mid x_i) = \pi_k(x_i) $$ or through linked kernel parameters for each component. The returned fit stores posterior draws of the latent cluster labels and associated parameters; the representative clustering is extracted later by predict.dpmixgpd_cluster_fit() using Dahl's least-squares rule.

Use type = "weights" or type = "both" only when the formula includes predictors and when an explicit number of components is supplied. Otherwise the builder stops before fitting.

Examples

# \donttest{
data("nc_realX100_p3_k2", package = "CausalMixGPD")
dat <- data.frame(y = nc_realX100_p3_k2$y[1:20],
                  nc_realX100_p3_k2$X[1:20, , drop = FALSE])
fit <- dpmix.cluster(
  y ~ x1 + x2 + x3,
  data = dat,
  kernel = "normal",
  type = "param",
  components = 3,
  mcmc = list(niter = 60, nburnin = 30, thin = 1, nchains = 1, seed = 1)
)
#> [cluster] Validating configuration
#> [cluster] Checking build/compile cache
#> [cluster] Building model and MCMC configuration
#> [cluster] Compiling NIMBLE model
#> [cluster] Initializing chains
#> [cluster] Running MCMC
#> [cluster] Finalizing WAIC and diagnostics
#> [cluster] Assembling fit object
summary(fit)
#> $K_star
#> [1] 1
#> 
#> $cluster_sizes
#> 
#>  1 
#> 20 
#> 
#> $cluster_profiles
#>   cluster  n y_mean  y_sd x1_mean x1_sd x2_mean x2_sd x3_mean x3_sd
#> 1      C1 20 -0.178 1.313   0.144 0.999  -0.104 0.543   0.386 1.007
#>   certainty_mean certainty_sd
#> 1              1            0
#> 
#> $certainty
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       1       1       1       1       1       1 
#> 
#> $source
#> [1] "train"
#> 
#> $burnin
#> [1] 0
#> 
#> $thin
#> [1] 1
#> 
#> attr(,"class")
#> [1] "summary.dpmixgpd_cluster_fit" "list"                        
# }

Usage

Arguments

Value

Details

See also

Examples