Plot a cluster bundle

Usage

# S3 method for class 'dpmixgpd_cluster_bundle'
plot(x, plotly = getOption("CausalMixGPD.plotly", FALSE), ...)

# S3 method for class 'dpmixgpd_cluster_fit'
plot(
  x,
  which = c("psm", "k", "sizes", "summary"),
  burnin = NULL,
  thin = NULL,
  psm_max_n = 2000L,
  top_n = 5L,
  order_by = c("size", "label"),
  plotly = getOption("CausalMixGPD.plotly", FALSE),
  ...
)

# S3 method for class 'dpmixgpd_cluster_labels'
plot(
  x,
  type = c("sizes", "certainty", "summary"),
  top_n = 5L,
  order_by = c("size", "label"),
  plotly = getOption("CausalMixGPD.plotly", FALSE),
  ...
)

# S3 method for class 'dpmixgpd_cluster_psm'
plot(
  x,
  psm_max_n = x$psm_max_n %||% 2000L,
  order_by = c("label", "hclust", "input"),
  plotly = getOption("CausalMixGPD.plotly", FALSE),
  ...
)

Arguments

x

Cluster PSM object.

plotly

Logical; if TRUE, convert the ggplot2 output to a plotly / htmlwidget representation via .wrap_plotly(). Defaults to getOption("CausalMixGPD.plotly", FALSE).

...

Unused.

which

Plot type:

"psm": posterior similarity matrix heatmap
"k": posterior number of occupied clusters
"sizes": bar chart of representative cluster sizes
"summary": cluster-specific response summaries

burnin

Number of initial posterior draws to discard.

thin

Keep every thin-th posterior draw.

psm_max_n

Maximum allowed matrix size for plotting.

top_n

Number of populated representative clusters to display for type = "sizes" or type = "summary". Use NULL to display all populated clusters.

order_by

Ordering rule for rows and columns:

"label": order by representative cluster labels when available
"hclust": order by hierarchical clustering of 1 - PSM
"input": preserve input order

type

Plot type:

"sizes": bar chart of representative cluster sizes
"certainty": assignment certainty distribution
"summary": cluster-specific response boxplots

Value

A ggplot2 object or a plotly/htmlwidget object when plotly = TRUE.

Details

The bundle plot is a metadata display rather than an inferential graphic. It mirrors the structural fields reported by print() and summary() in a single panel so the pre-MCMC clustering specification can be reviewed in a figure-oriented workflow or notebook.

Because the object has not been sampled yet, no representative partition or posterior uncertainty is shown here. Use plot.dpmixgpd_cluster_fit(), plot.dpmixgpd_cluster_labels(), or plot.dpmixgpd_cluster_psm() after fitting when you need substantive clustering output.

This plot method exposes the main posterior diagnostics for clustering. The which = "k" view tracks the number of occupied clusters across retained draws, which = "psm" visualizes pairwise co-clustering probabilities, which = "sizes" displays the size profile of the representative partition, and which = "summary" shows response summaries conditional on the selected representative labels.

The representative partition is obtained from predict.dpmixgpd_cluster_fit() using Dahl's least-squares rule. As a result, the sizes and summary views describe that representative clustering rather than the full posterior distribution over partitions.

This method visualizes the representative partition stored in a dpmixgpd_cluster_labels object. The sizes view emphasizes the empirical distribution of the selected clusters, the certainty view summarizes the assignment scores $\max_k p_{ik}$, and the summary view compares the attached response data across representative clusters.

For new-data prediction, the plots are always interpreted relative to the representative training clusters. That is why only clusters observed in the predicted sample are shown even though the training partition may contain additional occupied groups.

The heatmap visualizes the matrix $$ \mathrm{PSM}_{ij} \approx \frac{1}{S} \sum_{s=1}^S I(z_i^{(s)} = z_j^{(s)}), $$ so larger values indicate pairs of observations that are stably allocated to the same cluster over the retained posterior draws.

Because the PSM is an $n \times n$ object, plotting and even storing it becomes expensive for large n. The psm_max_n argument is therefore a deliberate guard against accidental quadratic memory use.

Usage

Arguments

Value

Details

See also