What is it about?

Automatic live video commenting is an emerging task, with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is important but missing from the current methods. In this paper, we first treat the ALVC task as a one-to-many generation task with sentimental distinction, to achieve diverse video commenting with multiple sentiments and semantics.

Featured Image

Why is it important?

We propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network, which successfully achieves diverse video commenting with multiple sentiments and semantics. To the best of our knowledge, we are the first to focus on the diversity of comments in the automatic live video commenting task. We propose a sentiment-oriented diversity encoder module, which elegantly combines VAE and random mask mechanism to achieve semantic diversity and further align semantic features with language and video modalities under sentiment guidance. Meanwhile, we propose a batch-attention module for sample relationship learning, to alleviate the problem of missing sentimental samples caused by the data imbalance. We provide an evaluation protocol for the ALVC task which can measure both the quality and diversity of generated comments simultaneously, further assisting future research.

Perspectives

I hope this work can inspire other approaches to explore the sentiment-oriented live video commenting task and further encourage human-interacted comment generation.

Fengyi Fu
University of Science and Technology of China

Read the Original

This page is a summary of: Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting, ACM Transactions on Multimedia Computing Communications and Applications, November 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3633334.
You can read the full text:

Read

Contributors

The following have contributed to this page