All Stories

  1. MoS 2 : Mixture of Scale and Shift Experts for Text-Only Video Captioning
  2. Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
  3. CenterCLIP
  4. Recurrent Attention Network with Reinforced Generator for Visual Dialog