What is it about?
An introduction into self-supervised learning for videos and a summary of the current research landscape. The main areas include: 1) pretext task learning, 2) generative learning, 3) contrastive learning, and 4) cross-modal agreement. In addition to covering self-supervised learning for video in vision only, we include multimodal approaches that use additional modalities like audio and text. More info can be found at our GitHub project link: https://bit.ly/3Oimc7Q
Featured Image
Photo by Sigmund on Unsplash
Why is it important?
Self-supervised learning reduces the requirement of dense annotation for training and provides generalizable foundation models that can be used for downstream tasks or emergent behaviors.
Perspectives
Read the Original
This page is a summary of: Self-Supervised Learning for Videos: A Survey, ACM Computing Surveys, July 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3577925.
You can read the full text:
Resources
Contributors
The following have contributed to this page