Splitting Large Medical Data Sets Based on Normal Distribution in Cloud Environment

Hao Lan Zhang; Yali Zhao; Chaoyi Pang; Jinyuan He

doi:10.1109/tcc.2015.2462361

What is it about?

Much of research work has been carried out based on MapReduce in order to reduce cloud storage and improve analytical efficiency. This paper adopts a totally differnt approach, which focuses on the data characteristics such as normal distribution, possion distribution. In this paper, the data reduction is based on unsupervised sampling from original data which can greatly reduce the data size but minimize information losses.

Photo by imgix on Unsplash

Why is it important?

In the digital age, the explosion of data volume demends an efficient solution to reduce the data size particularly for cloud-based systems. This paper proposes data characteristics-based method instead of MapReduce solution. The experimental results show that one dimentional data set is more than 95% of similar to original data sets.

Perspectives

We have developed multi-dimentional data sets sampling solutions. In the future, more advanced methods will be applied to our models. One dimentional sampling can solve the data storage reduction problem for simple data structure. More complex multi-dimentional data structures can use covariance matrix to solve the problem as we publish paper on WI 2022.
Henry Zane
NIT, Zhejiang University

This page is a summary of: Splitting Large Medical Data Sets Based on Normal Distribution in Cloud Environment, IEEE Transactions on Cloud Computing, April 2020, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/tcc.2015.2462361.
You can read the full text:

Read

Resources

Open Access version
Data sampling and storage reduction
A software for data sampling is availabe upon request. https://www.reddit.com/r/hybridworld/comments/10bjghj/backbone_of_hybrid_world_reduce_cloud_computing/?utm_source=share&utm_medium=web2x&context=3

Contributors

The following have contributed to this page

Henry Zane
NIT, Zhejiang University

Reduce data storage in cloud environment and improve analytical efficiency.

What is it about?

Why is it important?

Perspectives

Resources

Data sampling and storage reduction

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Reduce data storage in cloud environment and improve analytical efficiency.

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Data sampling and storage reduction

Contributors

Share this page:

You might also like

Practical Polytope Volume Approximation

Effects of organizational culture on a large scale IT introduction effort: a case study of the Norwegian army's EDBLF project

The support of constructs in thesaurus tools from a Semantic Web perspective: Framework to assess standard conformance

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management