Revisiting the Optimal Probability Estimator from Small Samples for Data Mining

Bojan Cestnik

doi:10.2478/amcs-2019-0058

What is it about?

The estimation of probabilities is hard when the number of observations (sample size) is small. What is the definition of sufficiently large sample? How can (subjective) prior probabilities reduce the estimation error? By comparing several probability estimation methods (relative frequency, Laplace's rule of succession, Piegat's formula, the m-estimate) we address these questions within a carefully designed experimental framework in R, which can be publicly accessed from GitHub.

Photo by Steve Johnson on Unsplash

Why is it important?

In the paper we give the definition of small samples based on the probability estimation error analysis. We compare several probability estimation methods and identify their strengths and weaknesses on small samples. We demonstrate that including prior probabilities in the final probability estimation is beneficial when the difference between the estimated prior and the actual authentic prior is less than 0.3.

Perspectives

The paper presents an in-depth overview of probability estimation methods that are used in data mining, and a framework for evaluation of method's errors. It provides useful contents for data science researchers and practitioners.
Bojan Cestnik

This page is a summary of: Revisiting the Optimal Probability Estimator from Small Samples for Data Mining, International Journal of Applied Mathematics and Computer Science, December 2019, De Gruyter,
DOI: 10.2478/amcs-2019-0058.
You can read the full text:

Read

Contributors

The following have contributed to this page

Bojan Cestnik

How to better estimate probabilities from small samples in data mining

What is it about?

Why is it important?

Perspectives

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

How to better estimate probabilities from small samples in data mining

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

You might also like

Evolving deep convolutional neutral network by hybrid sine–cosine and extreme learning machine for real-time COVID19 diagnosis from X-ray images

DMCP: A Distributed Mobile Charging Protocol in Wireless Rechargeable Sensor Networks

A synthetic macroscopic magnetic unipole

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management