What is it about?
Efficient recognition of emotions has attracted extensive research interest, which makes new applications in many fields possible, such as human-computer interaction, disease diagnosis, service robots, etc. Although existing work on sentiment analysis relying on sensors or unimodal methods performs well for simple contexts like business recommendation and facial expression recognition, it does far below expectations for complex scenes, such as sarcasm, disdain, and metaphors. In this article, we propose a novel two-stage multimodal learning framework, called AMSA, to adaptively learn correlation and complementarity between modalities for dynamic fusion, achieving more stable and precise sentiment analysis results.
Featured Image
Photo by Tengyart on Unsplash
Why is it important?
We focus on mixed emotions and complex contexts in affective computing, such as sarcasm, disdain and metaphors. Our novel two-stage multimodal learning framework, called AMSA, can accurately predict emotion categories in complex contexts. The average accuracy on the three data sets of self-made Video-SA, CMU-MOSEI, and CMU-MOSI has been improved by an average of 3%, especially in the emotional calculation of complex contexts, reaching SOTA.
Perspectives
Read the Original
This page is a summary of: AMSA: Adaptive Multimodal Learning for Sentiment Analysis, ACM Transactions on Multimedia Computing Communications and Applications, February 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3572915.
You can read the full text:
Contributors
The following have contributed to this page