What is it about?

In this survey, we present and organise methods that leverage unlabelled data in a semi-supervised setting for streaming data. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them.

Featured Image

Why is it important?

Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). Due to constraints concerning streaming (and evolving) data, we focus on the second option (semi-supervised learning), in which techniques must rely on finding and exploiting the underlying characteristics of the data distribution.

Perspectives

I believe the primary purpose of this work is to help people navigate this fascinating area involving stream learning, concept drifting, semi-supervised learning, and delayed labelling. It was a great pleasure to find and learn about existing techniques, and I was even more delighted that we could elucidate possible areas for future research. I hope that the reader looking for references on how to frame real-world problems as partially labelled data streams or who want to learn about possible options for assessing algorithms in such configuration will find our work helpful.

Heitor Gomes
University of Waikato

Read the Original

This page is a summary of: A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams, ACM Computing Surveys, March 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3523055.
You can read the full text:

Read

Contributors

The following have contributed to this page