What is it about?

You have to do supervised classification. A data set is collected and manually labelled. But the labels of some objects are unknown or uncertain. For example, in a satellite image it is not clear whether it is a pedestrian, a tree or a shadow. Or an e-mail might be classified as 20 percent spam and 80 percent as a regular e-mail. Can this additional information be used in the classifier to improve the quality of the classification? Yes, it can.

Featured Image

Why is it important?

Google and other companies collects a huge amount of data. Special teams try to classify/label this data. Often only a small part (5%) is labelled by humans, the other part of the data may be labelled by an algorithm, which may not be accurate. Can these imperfect labels still be used to help predict how to label the remaining samples? This problem appears when the labelling is impossible, time-consuming, or expensive.


Companies collect a lot of data to make predictions. Unfortunately, some of this data is uncertain or unknown. This incomplete and uncertain data can be used to improve the quality of the forecast using new mathematical algorithms.

Dr. Alexander Litvinenko
Rheinisch Westfalische Technische Hochschule Aachen

Read the Original

This page is a summary of: On a Weakly Supervised Classification Problem, January 2022, Springer Science + Business Media,
DOI: 10.1007/978-3-031-16500-9_26.
You can read the full text:



The following have contributed to this page