DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering

R Lakshmi; S Baskar

doi:10.1177/0165551518816302

What is it about?

Selection of initial centroids is an important task in K-means clustering for improving the effectiveness of clustering. We have selected the document having the minimum standard deviation of its term frequency is first centroid. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. By way of this initial selection approach, we have avoided the random initial selection for K-means document clustering. We have used Reuters-21578 and WebKB data sets to confirm the effectiveness of clustering with respect to purity, entropy and F-measure.

Photo by Henry Lorenzatto on Unsplash

Why is it important?

Our findings show that the effectiveness of clustering (grouping) documents can be improved than previously suggested initial centroids methods.

Perspectives

This article lead to overcome the issues of random initial seeds selection for K-means document clustering and ultimately increase the effectiveness of document grouping.
Lakshmi R
K.L.N College of Engineering

This page is a summary of: DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering, Journal of Information Science, December 2018, SAGE Publications,
DOI: 10.1177/0165551518816302.
You can read the full text:

Read

Contributors

The following have contributed to this page

Lakshmi R
K.L.N College of Engineering

Dissimilarity based Initial Centroids selection for K-means Document Clustering

What is it about?

Why is it important?

Perspectives

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Dissimilarity based Initial Centroids selection for K-means Document Clustering

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

You might also like

Structure and complexity in six supply chains of the Brazilian wind turbine industry

Sustaining Scholarly Publishing: New Business Models for University Presses: A Report of the AAUP Task Force on Economic Models for Scholarly Publishing

A didactic innovation project in Higher Education through a Visual and Academic Literacy competence-based program

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management