All Stories

  1. Multimodal Emotion-Cause Pair Extraction with Holistic Interaction and Label Constraint
  2. S 3 Agent: Unlocking the Power of VLLM for Zero-Shot Multi-Modal Sarcasm Detection
  3. Introduction to the Special Issue on Deep Multimodal Generation and Retrieval
  4. CogMAEC'25: The 1st Workshop on Cognition-oriented Multimodal Affective and Empathetic Computing
  5. ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
  6. LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection
  7. MCM-DPO : Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
  8. The ACM Multimedia 2025 Grand Challenge of Avatar-based Multimodal Empathetic Conversation
  9. FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents
  10. The ACM Multimedia 2025 Grand Challenge of Multimodal Conversational Aspect-based Sentiment Analysis
  11. LGM3A '25 Keynote Talk -- On Path to Multimodal Generalist: General-Level and General-Bench
  12. Proceedings of the 1st International Workshop on Cognition-oriented Multimodal Affective and Empathetic Computing
  13. MFSVFND: Multimodal Fusion Network for Detecting Fake News on Short Video Platforms
  14. From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
  15. Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark
  16. Toward Complex-query Referring Image Segmentation: A Novel Benchmark
  17. Revisiting Conversation Discourse for Dialogue Disentanglement
  18. Fine-grained Structural Hallucination Detection for Unified Visual Comprehension and Generation in Multimodal LLM
  19. The 2nd International Workshop on Deep Multi-modal Generation and Retrieval
  20. The ACM Multimedia 2024 Viual Spatial Description Grand Challenge
  21. SpeechEE: A Novel Benchmark for Speech Event Extraction
  22. PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis
  23. I3: I ntent- I ntrospective Retrieval Conditioned on I nstructions
  24. Multi-view Counterfactual Contrastive Learning for Fact-checking Fake News Detection
  25. MMLSCU: A Dataset for Multi-modal Multi-domain Live Streaming Comment Understanding
  26. Deep Multimodal Learning for Information Retrieval
  27. Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
  28. On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training