All Stories

  1. Video-Level Multimodal Relation Extraction with Event-Entity Semantic Consistency
  2. Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization
  3. Knowledge-Aware Causal Inference Network for Visual Dialog