All Stories

  1. SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion
  2. Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement
  3. Investigating Domain Gaps for Indoor 3D Object Detection
  4. Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset
  5. Interact-Custom: Customized Human Object Interaction Image Generation
  6. RaT2IGen: Relation-aware Text-to-image Generation via Learnable Prompt
  7. Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning
  8. Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-Identification
  9. InsVP: Efficient Instance Visual Prompting from Image Itself
  10. SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection
  11. ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
  12. RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene Generation
  13. EOGT: Video Anomaly Detection with Enhanced Object Information and Global Temporal Dependency
  14. SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback
  15. Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
  16. MV-Diffusion: Motion-aware Video Diffusion Model
  17. Efficiency-optimized Video Diffusion Models
  18. Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval
  19. CAT: a coarse-to-fine attention tree for semantic change detection
  20. Multi-Behavior Recommendation with Cascading Graph Convolution Networks
  21. LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation
  22. MKVSE: Multimodal Knowledge Enhanced Visual-Semantic Embedding for Image-Text Retrieval
  23. SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization
  24. Learn from Unlabeled Videos for Near-duplicate Video Retrieval
  25. RCE-HIL
  26. Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining
  27. A New Benchmark and Approach for Fine-grained Cross-media Retrieval
  28. CM-GANs