Plot a cluster bundle
plot.dpmixgpd_cluster_fit.RdProduce a compact graphical summary of the cluster bundle metadata.
Visualize either the posterior similarity matrix, the posterior number of occupied clusters, the size distribution of the representative clusters, or cluster-specific response summaries.
Visualize representative cluster sizes, assignment certainty, or cluster-specific response
summaries. For type = "summary", the response view is shown as boxplots ordered by
cluster size or label. When x comes from predict(..., newdata = ...), only clusters
represented in the new sample are displayed.
Heatmap of pairwise posterior co-clustering probabilities.
Usage
# S3 method for class 'dpmixgpd_cluster_bundle'
plot(x, plotly = getOption("CausalMixGPD.plotly", FALSE), ...)
# S3 method for class 'dpmixgpd_cluster_fit'
plot(
x,
which = c("psm", "k", "sizes", "summary"),
burnin = NULL,
thin = NULL,
psm_max_n = 2000L,
top_n = 5L,
order_by = c("size", "label"),
plotly = getOption("CausalMixGPD.plotly", FALSE),
...
)
# S3 method for class 'dpmixgpd_cluster_labels'
plot(
x,
type = c("sizes", "certainty", "summary"),
top_n = 5L,
order_by = c("size", "label"),
plotly = getOption("CausalMixGPD.plotly", FALSE),
...
)
# S3 method for class 'dpmixgpd_cluster_psm'
plot(
x,
psm_max_n = x$psm_max_n %||% 2000L,
order_by = c("label", "hclust", "input"),
plotly = getOption("CausalMixGPD.plotly", FALSE),
...
)Arguments
- x
Cluster PSM object.
- plotly
Logical; if
TRUE, convert theggplot2output to aplotly/htmlwidgetrepresentation via.wrap_plotly(). Defaults togetOption("CausalMixGPD.plotly", FALSE).- ...
Unused.
- which
Plot type:
"psm": posterior similarity matrix heatmap"k": posterior number of occupied clusters"sizes": bar chart of representative cluster sizes"summary": cluster-specific response summaries
- burnin
Number of initial posterior draws to discard.
- thin
Keep every
thin-th posterior draw.- psm_max_n
Maximum allowed matrix size for plotting.
- top_n
Number of populated representative clusters to display for
type = "sizes"ortype = "summary". UseNULLto display all populated clusters.- order_by
Ordering rule for rows and columns:
"label": order by representative cluster labels when available"hclust": order by hierarchical clustering of1 - PSM"input": preserve input order
- type
Plot type:
"sizes": bar chart of representative cluster sizes"certainty": assignment certainty distribution"summary": cluster-specific response boxplots
Value
A ggplot2 object or a plotly/htmlwidget object when plotly = TRUE.
A ggplot2 object or a plotly/htmlwidget object when plotly = TRUE.
A ggplot2 object or a plotly/htmlwidget object when plotly = TRUE.
A ggplot2 object or a plotly/htmlwidget object when plotly = TRUE.
Details
The bundle plot is a metadata display rather than an inferential graphic. It
mirrors the structural fields reported by print() and summary() in a
single panel so the pre-MCMC clustering specification can be reviewed in a
figure-oriented workflow or notebook.
Because the object has not been sampled yet, no representative partition or
posterior uncertainty is shown here. Use plot.dpmixgpd_cluster_fit(),
plot.dpmixgpd_cluster_labels(), or plot.dpmixgpd_cluster_psm() after
fitting when you need substantive clustering output.
This plot method exposes the main posterior diagnostics for clustering. The
which = "k" view tracks the number of occupied clusters across retained
draws, which = "psm" visualizes pairwise co-clustering probabilities,
which = "sizes" displays the size profile of the representative partition,
and which = "summary" shows response summaries conditional on the selected
representative labels.
The representative partition is obtained from
predict.dpmixgpd_cluster_fit() using Dahl's least-squares rule. As a
result, the sizes and summary views describe that representative
clustering rather than the full posterior distribution over partitions.
This method visualizes the representative partition stored in a
dpmixgpd_cluster_labels object. The sizes view emphasizes the empirical
distribution of the selected clusters, the certainty view summarizes the
assignment scores \(\max_k p_{ik}\), and the summary view compares the
attached response data across representative clusters.
For new-data prediction, the plots are always interpreted relative to the representative training clusters. That is why only clusters observed in the predicted sample are shown even though the training partition may contain additional occupied groups.
The heatmap visualizes the matrix $$ \mathrm{PSM}_{ij} \approx \frac{1}{S} \sum_{s=1}^S I(z_i^{(s)} = z_j^{(s)}), $$ so larger values indicate pairs of observations that are stably allocated to the same cluster over the retained posterior draws.
Because the PSM is an \(n \times n\) object, plotting and even storing it
becomes expensive for large n. The psm_max_n argument is therefore a
deliberate guard against accidental quadratic memory use.
See also
summary.dpmixgpd_cluster_bundle(), dpmix.cluster(), dpmgpd.cluster().
predict.dpmixgpd_cluster_fit(), summary.dpmixgpd_cluster_fit(),
plot.dpmixgpd_cluster_psm(), plot.dpmixgpd_cluster_labels().
summary.dpmixgpd_cluster_labels(), predict.dpmixgpd_cluster_fit().
predict.dpmixgpd_cluster_fit(), summary.dpmixgpd_cluster_psm(),
plot.dpmixgpd_cluster_fit().
Other cluster workflow:
cluster_profiles(),
dpmgpd.cluster(),
dpmix.cluster(),
predict.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_bundle(),
print.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_labels(),
print.dpmixgpd_cluster_psm(),
summary.dpmixgpd_cluster_bundle(),
summary.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm()
Other cluster workflow:
cluster_profiles(),
dpmgpd.cluster(),
dpmix.cluster(),
predict.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_bundle(),
print.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_labels(),
print.dpmixgpd_cluster_psm(),
summary.dpmixgpd_cluster_bundle(),
summary.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm()
Other cluster workflow:
cluster_profiles(),
dpmgpd.cluster(),
dpmix.cluster(),
predict.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_bundle(),
print.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_labels(),
print.dpmixgpd_cluster_psm(),
summary.dpmixgpd_cluster_bundle(),
summary.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm()
Other cluster workflow:
cluster_profiles(),
dpmgpd.cluster(),
dpmix.cluster(),
predict.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_bundle(),
print.dpmixgpd_cluster_fit(),
print.dpmixgpd_cluster_labels(),
print.dpmixgpd_cluster_psm(),
summary.dpmixgpd_cluster_bundle(),
summary.dpmixgpd_cluster_fit(),
summary.dpmixgpd_cluster_labels(),
summary.dpmixgpd_cluster_psm()