What is it about?

Efficient recognition of emotions has attracted extensive research interest, which makes new applications in many fields possible, such as human-computer interaction, disease diagnosis, service robots, etc. Although existing work on sentiment analysis relying on sensors or unimodal methods performs well for simple contexts like business recommendation and facial expression recognition, it does far below expectations for complex scenes, such as sarcasm, disdain, and metaphors. In this article, we propose a novel two-stage multimodal learning framework, called AMSA, to adaptively learn correlation and complementarity between modalities for dynamic fusion, achieving more stable and precise sentiment analysis results.

Featured Image

Why is it important?

We focus on mixed emotions and complex contexts in affective computing, such as sarcasm, disdain and metaphors. Our novel two-stage multimodal learning framework, called AMSA, can accurately predict emotion categories in complex contexts. The average accuracy on the three data sets of self-made Video-SA, CMU-MOSEI, and CMU-MOSI has been improved by an average of 3%, especially in the emotional calculation of complex contexts, reaching SOTA.

Perspectives

Writing this article was a great pleasure. We hope this article draws attention to the diversity of emotions, especially for features such as mixed emotions that are difficult for machines to discern. Because the learning of these emotions is a way to somehow advance fields like human-computer interaction. In the future, we hope to find an appropriate method to fuse and learn this relationship between pre-sentiment and post-sentiment element pairs to complement this field.

jingyao wang
Institute of Software Chinese Academy of Sciences

Read the Original

This page is a summary of: AMSA: Adaptive Multimodal Learning for Sentiment Analysis, ACM Transactions on Multimedia Computing Communications and Applications, February 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3572915.
You can read the full text:

Read

Contributors

The following have contributed to this page