What is it about?

This paper presents a novel approach for imputing missing values that incorporates Kernelized Fuzzy C-Means (KFCM) clustering and proposes a method termed LIKFCM, which combines its benefits with Linear Interpolation (LI). The proposed method has been compared with eight other existing methods across ten publicly accessible real-world datasets for experimentation in terms of two evaluation criteria: RMSE and MAE. Sixty missing combinations based on different missing ratios as 5%, 10%, 15%, 20%, 25%, and 30% have been considered for all ten datasets to examine imputation methods’ performance.

Featured Image

Why is it important?

Addressing missing values is a persistent challenge in the field of data mining. The presence of incomplete data can significantly compromise the overall data quality. Consequently, it is crucial to handle incomplete data efficiently. This paper proposes the LIPFCM method for imputing missing values by hybridizing LI and PFCM to handle incomplete data. From the experimental results, it is evident that our proposed method outperforms the existing imputation methods with significant improvements in terms of RMSE & MAE for these datasets across different missing ratios. The performance validation of the proposed approach against other state-of-the-art imputation methods has been conducted utilizing a Kendall’s W statistical test, involving a comparison of their mean ranks across different missing ratios. The outcomes indicate that LIKFCM has outperformed other imputation methods, attaining the highest rank in terms of different evaluation criteria.

Perspectives

Our research is not limited to a specific field. Consequently, ten widely used real-world datasets from the UCI repository from a diverse array of fields, such as mine detection, medical domains, chemical analysis, and plant domains, are considered for this study. The analysis outcome implies that the proposed LIKFCM has risen as the best imputation method as it exhibits the highest rank among all the imputation methods in terms of RMSE and MAE across all the datasets utilized in the study, even when considering various missing ratios. In future studies, there is a potential for conducting further research to investigate its suitability across a wider range of datasets and applications, to assess its potential for generalization.

Jyoti Singh

Read the Original

This page is a summary of: LIKFCM: Linear interpolation-based kernelized fuzzy C-means clustering imputation method for handling incomplete data, Journal of Intelligent & Fuzzy Systems, January 2024, IOS Press,
DOI: 10.3233/jifs-236869.
You can read the full text:

Read

Contributors

The following have contributed to this page