What is it about?

Categorical data, such as survey checkboxes, are widely used to understand people's preferences and behaviors. But can we really use purely data-driven methods to find meaningful groups and pinpoint the key questions that set them apart? In this work, we introduce a new approach that helps researchers and managers identify distinct groups and discover which questions matter most, leading to deeper insights and a better understanding of their data.

Featured Image

Why is it important?

Existing methods for grouping categorical data often treat every survey question as equally important, which can lead to less accurate or confusing results. Our approach is unique because it not only finds meaningful groups, but also identifies which questions are important and which can be ignored by measuring their statistical significance in separating these groups. In today’s world, where organizations collect large amounts of survey and categorical data, our method provides a clearer and more practical way to gain insights, make targeted decisions, and focus on the issues that truly matter.

Read the Original

This page is a summary of: Clustering Categorical Data via Multiple Hypothesis Testing, ACM Transactions on Knowledge Discovery from Data, May 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3735977.
You can read the full text:

Read

Contributors

The following have contributed to this page