A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

Heitor Murilo Gomes; Maciej Grzenda; Rodrigo Mello; Jesse Read; Minh Huong Le Nguyen; Albert Bifet

doi:10.1145/3523055

What is it about?

In this survey, we present and organise methods that leverage unlabelled data in a semi-supervised setting for streaming data. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them.

Photo by Conny Schneider on Unsplash

Why is it important?

Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). Due to constraints concerning streaming (and evolving) data, we focus on the second option (semi-supervised learning), in which techniques must rely on finding and exploiting the underlying characteristics of the data distribution.

Perspectives

I believe the primary purpose of this work is to help people navigate this fascinating area involving stream learning, concept drifting, semi-supervised learning, and delayed labelling. It was a great pleasure to find and learn about existing techniques, and I was even more delighted that we could elucidate possible areas for future research. I hope that the reader looking for references on how to frame real-world problems as partially labelled data streams or who want to learn about possible options for assessing algorithms in such configuration will find our work helpful.
Heitor Gomes
University of Waikato

This page is a summary of: A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams, ACM Computing Surveys, March 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3523055.
You can read the full text:

Read

Contributors

The following have contributed to this page

Heitor Gomes
University of Waikato

A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

What is it about?

Why is it important?

Perspectives

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

You might also like

Vehicle logo recognition using whitening transformation and deep learning

ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards Data Quality by Design

Others' fortune in online vs offline settings: how envy affects people's intention to share information

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management