What is it about?
Much of research work has been carried out based on MapReduce in order to reduce cloud storage and improve analytical efficiency. This paper adopts a totally differnt approach, which focuses on the data characteristics such as normal distribution, possion distribution. In this paper, the data reduction is based on unsupervised sampling from original data which can greatly reduce the data size but minimize information losses.
Featured Image
Photo by imgix on Unsplash
Why is it important?
In the digital age, the explosion of data volume demends an efficient solution to reduce the data size particularly for cloud-based systems. This paper proposes data characteristics-based method instead of MapReduce solution. The experimental results show that one dimentional data set is more than 95% of similar to original data sets.
Perspectives
Read the Original
This page is a summary of: Splitting Large Medical Data Sets Based on Normal Distribution in Cloud Environment, IEEE Transactions on Cloud Computing, April 2020, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/tcc.2015.2462361.
You can read the full text:
Resources
Contributors
The following have contributed to this page