What is it about?
In this survey, we present and organise methods that leverage unlabelled data in a semi-supervised setting for streaming data. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them.
Featured Image
Photo by Conny Schneider on Unsplash
Why is it important?
Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). Due to constraints concerning streaming (and evolving) data, we focus on the second option (semi-supervised learning), in which techniques must rely on finding and exploiting the underlying characteristics of the data distribution.
Perspectives
Read the Original
This page is a summary of: A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams, ACM Computing Surveys, March 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3523055.
You can read the full text:
Contributors
The following have contributed to this page