What is it about?
Automatic live video commenting is an emerging task, with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is important but missing from the current methods. In this paper, we first treat the ALVC task as a one-to-many generation task with sentimental distinction, to achieve diverse video commenting with multiple sentiments and semantics.
Photo by Steve Johnson on Unsplash
Why is it important?
We propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network, which successfully achieves diverse video commenting with multiple sentiments and semantics. To the best of our knowledge, we are the first to focus on the diversity of comments in the automatic live video commenting task. We propose a sentiment-oriented diversity encoder module, which elegantly combines VAE and random mask mechanism to achieve semantic diversity and further align semantic features with language and video modalities under sentiment guidance. Meanwhile, we propose a batch-attention module for sample relationship learning, to alleviate the problem of missing sentimental samples caused by the data imbalance. We provide an evaluation protocol for the ALVC task which can measure both the quality and diversity of generated comments simultaneously, further assisting future research.
Read the Original
This page is a summary of: Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting, ACM Transactions on Multimedia Computing Communications and Applications, November 2023, ACM (Association for Computing Machinery),
You can read the full text:
The following have contributed to this page