What is it about?

This paper investigates the application of consensus clustering and meta-clustering to the set of all possible partitions of a data set. We show that when using a "complement" of Rand Index as a measure of cluster similarity, the total-separation partition, putting each element in a separate set, is chosen.

Featured Image

Why is it important?

As the number of available clustering algorithms applicable to the same data is growing, and the potential outputs may differ substantially, methodologies to reconcile them like meta-clustering or consensus clustering are under development. In this paper we demonstrated that both consensus clustering and meta-clustering using Cluster Difference (derived from Rand Index) as a measure of distance between partitions, when applied to the universe of all possible partitions, point to the partition containing each element in a separate set as the best compromise. It is quite easy to invent clustering algorithms delivering for the same set of data any clustering we want. But in the space of all partitions we get lost both by meta-clustering and consensus clustering. Because meta-clustering will provide us with a structure of partitions that has nothing to do with the data and consensus clustering will deliver the most trivial consensus having nothing to do with the data. This suggests that the user performing the task of clustering must at least have an approximate vision of the geometry of the data space. Only in this case the mentioned techniques may be helpful in the choice of appropriate compromise clustering.

Perspectives

It seems also worth investigating, how other cluster quality functions used as distances between partitions would behave under consensus clustering of the space of all possible partitions. It seems also worth investigating how such measures would behave not in the full universe of all partitions but rather for uniform random samples of it. Such a sampling would then constitute a background for investigations into the behaviour of other partition comparison indexes, of consensus and meta-clustering methods as well as for checking if a resultant consensus-partition or meta-cluster really gives a new insight or is just a random artefact.

Mieczysław Kłopotek
Polish Academy of Sciences, Institute of Computer Science, IPI PAN, Warsaw Poland

Read the Original

This page is a summary of: On Seeking Consensus Between Document Similarity Measures, Fundamenta Informaticae, October 2017, IOS Press,
DOI: 10.3233/fi-2017-1597.
You can read the full text:

Read

Contributors

The following have contributed to this page