What is it about?

Selection of initial centroids is an important task in K-means clustering for improving the effectiveness of clustering. We have selected the document having the minimum standard deviation of its term frequency is first centroid. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. By way of this initial selection approach, we have avoided the random initial selection for K-means document clustering. We have used Reuters-21578 and WebKB data sets to confirm the effectiveness of clustering with respect to purity, entropy and F-measure.

Featured Image

Why is it important?

Our findings show that the effectiveness of clustering (grouping) documents can be improved than previously suggested initial centroids methods.

Perspectives

This article lead to overcome the issues of random initial seeds selection for K-means document clustering and ultimately increase the effectiveness of document grouping.

Lakshmi R
K.L.N College of Engineering

Read the Original

This page is a summary of: DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering, Journal of Information Science, December 2018, SAGE Publications,
DOI: 10.1177/0165551518816302.
You can read the full text:

Read

Contributors

The following have contributed to this page