Predict labels or similarity matrices from a cluster fit
predict.dpmixgpd_cluster_fit.RdConvert posterior draws from a dpmixgpd_cluster_fit object into either a representative
clustering or a posterior similarity matrix (PSM). This is the main post-processing step for
the cluster workflow after dpmix.cluster() or dpmgpd.cluster().
Arguments
- object
A fitted cluster object.
- newdata
Optional new data containing the response and predictors required by the original formula. New-data prediction is available only for
type = "label".- type
Prediction target:
"label": representative partition via Dahl's least-squares rule"psm": posterior similarity matrix on the training sample
- burnin
Number of initial posterior draws to discard.
- thin
Keep every
thin-th posterior draw.- return_scores
Logical; if
TRUEandtype = "label", include the matrix of Dahl-cluster assignment scores.- psm_max_n
Maximum training sample size allowed for
type = "psm".- ...
Unused.
Value
A dpmixgpd_cluster_labels object when type = "label" or a
dpmixgpd_cluster_psm object when type = "psm".
Details
Let \(z_i^{(s)}\) denote the latent cluster label for observation \(i\) at posterior draw \(s\). The posterior similarity matrix is $$ \mathrm{PSM}_{ij} = \Pr(z_i = z_j \mid y) \approx \frac{1}{S} \sum_{s=1}^S I(z_i^{(s)} = z_j^{(s)}). $$ The returned label solution is the Dahl representative partition, obtained by choosing the draw whose adjacency matrix is closest to the PSM in squared error.
For newdata, the function combines draw-specific component weights and component densities to
produce posterior assignment scores relative to the representative training clusters. Returned
newdata label objects also carry the training labels and response data needed for comparative
plot(..., type = "summary") displays. A PSM is not defined for newdata, so type = "psm"
is restricted to the training sample.
Computing the PSM is \(O(n^2)\) in the training sample size, so psm_max_n guards against
accidental large matrix allocations.
See also
dpmix.cluster(), dpmgpd.cluster(), summary.dpmixgpd_cluster_fit(),
plot.dpmixgpd_cluster_fit(), summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm().
Other cluster workflow:
dpmgpd.cluster(),
dpmix.cluster(),
plot.dpmixgpd_cluster_bundle(),
plot.dpmixgpd_cluster_fit(),
plot.dpmixgpd_cluster_labels(),
plot.dpmixgpd_cluster_psm(),
print.dpmixgpd_cluster_bundle(),
print.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_labels(),
print.dpmixgpd_cluster_psm(),
summary.dpmixgpd_cluster_bundle(),
summary.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm()