Fit a clustering-only bulk model
dpmix.cluster.RdBuild and fit a Dirichlet-process mixture for clustering without causal estimands or posterior
prediction for a response surface. This interface focuses on latent partition recovery from a
formula specification and returns a cluster-fit object that can be summarized, plotted, or
converted into labels and posterior similarity matrices with predict.dpmixgpd_cluster_fit().
Arguments
- formula
Model formula. The response must be present in
data.- data
Data frame containing the response and optional predictors.
- type
Clustering mode:
"weights": links mixture weights to predictors"param": links kernel parameters to predictors"both": links both weights and kernel parameters to predictors
- default
Default mode used when
typeis omitted.- mcmc
MCMC control list passed into the cluster bundle.
- ...
Additional arguments passed to
build_cluster_bundle(), including kernel settings, prior overrides, component counts, and monitoring controls.
Details
The fitted model targets a latent partition \(z_1, \dots, z_n\) with component-specific kernel
parameters. Depending on type, predictors can enter through the gating probabilities
$$
\Pr(z_i = k \mid x_i) = \pi_k(x_i)
$$
or through linked kernel parameters for each component. The returned fit stores posterior draws
of the latent cluster labels and associated parameters; the representative clustering is extracted
later by predict.dpmixgpd_cluster_fit() using Dahl's least-squares rule.
Use type = "weights" or type = "both" only when the formula includes predictors and when an
explicit number of components is supplied. Otherwise the builder stops before fitting.
See also
dpmgpd.cluster(), predict.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_fit(), plot.dpmixgpd_cluster_fit(),
build_nimble_bundle(), dpmix().
Other cluster workflow:
cluster_profiles(),
dpmgpd.cluster(),
plot.dpmixgpd_cluster_bundle(),
predict.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_bundle(),
print.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_labels(),
print.dpmixgpd_cluster_psm(),
summary.dpmixgpd_cluster_bundle(),
summary.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm()
Examples
# \donttest{
data("nc_realX100_p3_k2", package = "CausalMixGPD")
dat <- data.frame(y = nc_realX100_p3_k2$y[1:20],
nc_realX100_p3_k2$X[1:20, , drop = FALSE])
fit <- dpmix.cluster(
y ~ x1 + x2 + x3,
data = dat,
kernel = "normal",
type = "param",
components = 3,
mcmc = list(niter = 60, nburnin = 30, thin = 1, nchains = 1, seed = 1)
)
#> [cluster] Validating configuration
#> [cluster] Checking build/compile cache
#> [cluster] Building model and MCMC configuration
#> [cluster] Compiling NIMBLE model
#> [cluster] Initializing chains
#> [cluster] Running MCMC
#> [cluster] Finalizing WAIC and diagnostics
#> [cluster] Assembling fit object
summary(fit)
#> $K_star
#> [1] 1
#>
#> $cluster_sizes
#>
#> 1
#> 20
#>
#> $cluster_profiles
#> cluster n y_mean y_sd x1_mean x1_sd x2_mean x2_sd x3_mean x3_sd
#> 1 C1 20 -0.178 1.313 0.144 0.999 -0.104 0.543 0.386 1.007
#> certainty_mean certainty_sd
#> 1 1 0
#>
#> $certainty
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1 1 1 1 1 1
#>
#> $source
#> [1] "train"
#>
#> $burnin
#> [1] 0
#>
#> $thin
#> [1] 1
#>
#> attr(,"class")
#> [1] "summary.dpmixgpd_cluster_fit" "list"
# }